Create a new analysis directories.
- general directory
- for plots
- for output of summary results
- for baseline tables
- for genetic analyses
- for Cox regression results
* General packages...
* Genomic packages...
For the ERA-CVD ‘druggable-MI-targets’ project (grantnumber: 01KL1802) we performed two related RNA sequencing (RNAseq) experiments:
conventional (‘bulk’) RNAseq using RNA extracted from carotid plaque samples, n ± 700. As of Thursday, October 31, 2024 all samples have been selected and RNA has been extracted; quality control (QC) was performed and we have a dataset of 635 samples. These data are now expanded with a second conventional bulk RNAseq expeiriment of n ± 600 samples.
single-cell RNAseq (scRNAseq) of at least n = 40 samples (20 females, 20 males). As of Thursday, October 31, 2024 data is available of 40 samples (3 females, 15 males), we are extending sampling to get more female samples.
Plaque samples are derived from carotid endarterectomies as part of the Athero-Express Biobank Study which is an ongoing study in the UMC Utrecht.
In this notebook we setup the files for the bulk RNAseq analyses.
First we will load the data:
Here we load the latest datasets from our Athero-Express bulk RNA experiments.
Athero-Express RNAseq Study 1: AERNAS1 d.d. 2023-04-07
mapped against cDNA reference of all transcripts in GRCh38.p13 and
Ensembl 108 (GRCh38.p13/ENSEMBL_GENES_108 accessed on 18-01-2023). These
include raw read counts of all non-ribosomal, protein coding genes with
existing HGNC gene name. All read counts are corrected for UMI sampling
by
raw.genecounts=round(-4096*(log(1-(raw.genecounts/4096))))
(note that log in this case equals ‘natural logarithm’,
i.e. ln). These data include the patients that passed the
QC based on Mokry,
M., Boltjes, A., Slenders, L. et al. Nat Cardiovasc Res 1,
1140–1155 (2022). File:
AE_bulk_RNA_batch1.minRib.PC_07042023.txt.
Athero-Express RNAseq Study 2: AERNAS2 The other dataset is
mapped d.d. 2023-08-02. These include raw read counts of all
non-ribosomal, protein coding genes with existing HGNC gene name. All
read counts are corrected for UMI sampling by
raw.genecounts=round(-4096*(log(1-(raw.genecounts/4096))))
(note that log in this case equals ‘natural logarithm’,
i.e. ln). File:
AE_bulk_RNA_batch2.minRib.PC_02082023.txt.
In summary, these bulk RNAseq data are filtered and corrected:
However, pre-processing of the data may be required for some analyses. Usually, a normalization for sequencing depth and quantile normalization is recommended.
# FIRST RUN DATA
# bulk RNAseq data; first run
# bulkRNA_counts_raw_qc_umicorr_firstrun <- fread(paste0(AERNA_loc,"/FIRSTRUN/raw_data_bulk/raw_counts_batch1till11_qc_umicorrected.txt"))
# bulk RNAseq data; re-run (deeper sequenced)
aernas1_counts_raw_qc_umicorr <- fread(paste0(AERNA_loc,"/RERUN/PROCESSED/AE_bulk_RNA_batch1.minRib.PC_07042023.txt")) # no ribosomal and only protein coding
# batch information
aernas1_meta <- fread(paste0(AERNA_loc,"/FIRSTRUN/raw_data_bulk/metadata_raw_counts_batch1till11.txt"))# NEWRUN DATA
aernas2_counts_raw_qc_umicorr <- fread(paste0(AERNA_loc,"/NEWRUN/raw_data_bulk/AE_bulk_RNA_batch2.minRib.PC_02082023.txt")) # no ribosomal and only protein coding
# batch information
# aernas2_meta <- fread(paste0(AERNA_loc,"/NEWRUN/raw_data_bulk/"))Quick peek at the counts and meta-data of the RNAseq experiment.
There are two small issues we need to address:
Inf and NA values
There are a couple of samples with infinite gene counts.
temp <- aernas1_counts_raw_qc_umicorr %>%
dplyr::mutate_if(is.numeric, as.integer)
cat("\nFixing the infinite gene counts.\n")
Fixing the infinite gene counts.
temp <- temp %>%
mutate(across(is.numeric, ~replace_na(.x, max(.x, na.rm = TRUE)))) %>%
dplyr::mutate(across( # For every column you want...
# everything(), # ...change all studynumber
dplyr::starts_with("ae"), # ...change all studynumber
~ dplyr::case_when(
. == Inf ~ max(.[is.finite(.)]), # +Inf becomes the finite max.
. == -Inf ~ min(.[is.finite(.)]), # -Inf becomes the finite min.
. == -0 ~ min(.[is.finite(.)]), # -0 becomes the finite min.
TRUE ~ . # Other values stay the same.
)
)
) Warning: There was 1 warning in `.fun()`.
ℹ In argument: `across(is.numeric, ~replace_na(.x, max(.x, na.rm = TRUE)))`.
Caused by warning:
! Use of bare predicate functions was deprecated in tidyselect 1.1.0.
ℹ Please use wrap predicates in `where()` instead.
# Was:
data %>% select(is.numeric)
# Now:
data %>% select(where(is.numeric))
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.mutate: no changes
There are a couple of samples with infinite gene counts.
temp2 <- aernas2_counts_raw_qc_umicorr %>%
dplyr::mutate_if(is.numeric, as.integer)
cat("\nFixing the infinite gene counts.\n")
Fixing the infinite gene counts.
temp2 <- temp2 %>%
mutate(across(is.numeric, ~replace_na(.x, max(.x, na.rm = TRUE)))) %>%
dplyr::mutate(across( # For every column you want...
# everything(), # ...change all studynumber
dplyr::starts_with("ae"), # ...change all studynumber
~ dplyr::case_when(
. == Inf ~ max(.[is.finite(.)]), # +Inf becomes the finite max.
. == -Inf ~ min(.[is.finite(.)]), # -Inf becomes the finite min.
. == -0 ~ min(.[is.finite(.)]), # -0 becomes the finite min.
TRUE ~ . # Other values stay the same.
)
)
) mutate: no changes
For annotations we use the annotables from Stephen Turner.
The columns of interest are:
Annotating AERNAS1 with b38.
# first run
names(temp)[names(temp) == "gene"] <- "ENSEMBL_gene_ID"
cat("\nAnnotating AERNAS2 with b38.\n")
Annotating AERNAS2 with b38.
# new run
names(temp2)[names(temp2) == "gene"] <- "ENSEMBL_gene_ID"
cat("\nChecking existence of duplicate ENSEMBL IDs - there shouldn't be any.\n")
Checking existence of duplicate ENSEMBL IDs - there shouldn't be any.
character(0)
character(0)
[1] 21835 655
aernas1_counts_raw_qc_umicorr_annot <- temp %>%
# arrange(p.adjusted) %>%
# head(20) %>%
inner_join(grch38, by=c("ENSEMBL_gene_ID"="ensgene")) %>%
# select(gene, estimate, p.adjusted, symbol, description) %>%
relocate(entrez, symbol, chr, start, end, strand, biotype, description,
.before = ae1) %>% # put everything before sample ae1
dplyr::filter(duplicated(ENSEMBL_gene_ID) == FALSE)inner_join: added 8 columns (entrez, symbol, chr, start, end, …) > rows only in x ( 0) > rows only in grch38 (52,540) > matched rows 22,578 (includes duplicates) > ======== > rows total 22,578relocate: columns reordered (ENSEMBL_gene_ID, entrez, symbol, chr, start, …)
character(0)
[1] 21843 472
aernas2_counts_raw_qc_umicorr_annot <- temp2 %>%
# arrange(p.adjusted) %>%
# head(20) %>%
inner_join(grch38, by=c("ENSEMBL_gene_ID"="ensgene")) %>%
# select(gene, estimate, p.adjusted, symbol, description) %>%
relocate(entrez, symbol, chr, start, end, strand, biotype, description,
.before = ae105) %>% # put everything before sample ae1
dplyr::filter(duplicated(ENSEMBL_gene_ID) == FALSE)inner_join: added 8 columns (entrez, symbol, chr, start, end, …) > rows only in x ( 0) > rows only in grch38 (52,532) > matched rows 22,586 (includes duplicates) > ======== > rows total 22,586relocate: columns reordered (ENSEMBL_gene_ID, entrez, symbol, chr, start, …)
character(0)
We have collected the clinical data, Athero-Express Biobank Study
AEDB and, the UMI-corrected, filtered bulk RNAseq data,
bulkRNA_counts and its meta-data,
bulkRNA-meta.
Here we will clean up the data and create a
SummarizedExperiment() object for downstream analyses anad
visualizations.
# match up with meta data of RNAseq experiment
aernas1_counts_raw_qc_umicorr_annotFilt <- aernas1_counts_raw_qc_umicorr_annot %>%
drop_na(chr) %>% # remove rows that have no information of start, end, chromosome and/or strand
dplyr::select(1:9, one_of(sort(as.character(AEDB.CEA.sampleList)))) # select gene expression of only patients in RNA-seq AE df, sort in same order as metadata study_numberdrop_na: no rows removedWarning: Unknown columns: `ae100`, `ae1001`, `ae1004`, `ae1010`, `ae1011`, `ae1012`, `ae1015`, `ae1017`, `ae1018`, `ae1019`, `ae102`, `ae1022`, `ae1025`, `ae103`, `ae1030`, `ae1033`, `ae104`, `ae1041`, `ae1045`, `ae1048`, `ae1049`, `ae105`, `ae1053`, `ae1057`, `ae1058`, `ae106`, `ae1065`, `ae1068`, `ae1071`, `ae1078`, `ae108`, `ae1080`, `ae1085`, `ae1086`, `ae1088`, `ae109`, `ae1095`, `ae11`, `ae110`, `ae1106`, `ae1108`, `ae111`, `ae1111`, `ae1113`, `ae1125`, `ae1126`, `ae1132`, `ae1133`, `ae1135`, `ae1145`, `ae115`, `ae1151`, `ae1153`, `ae1161`, `ae1162`, `ae1163`, `ae1166`, `ae1167`, `ae1169`, `ae1179`, `ae1181`, `ae1184`, `ae1185`, `ae1186`, `ae1189`, `ae119`, `ae1190`, `ae1191`, `ae1193`, `ae1194`, `ae1197`, `ae1198`, `ae120`, `ae1205`, `ae1206`, `ae1207`, `ae1210`, `ae1212`, `ae1216`, `ae1217`, `ae1219`, `ae1221`, `ae1222`, `ae1224`, `ae1228`, `ae1232`, `ae1233`, `ae1238`, `ae1239`, `ae124`, `ae1242`, `ae1243`, `ae1253`, `ae1254`, `ae1257`, `ae126`, `ae1262`, `ae1263`, `ae1264`, `ae1265`, `ae127`, `ae1271`, `ae1274`, `ae128`, `ae1282`, `ae1285`, `ae1288`, `ae129`, `ae1293`, `ae1294`, `ae1299`, `ae13`, `ae1300`, `ae1302`, `ae1304`, `ae131`, `ae1317`, `ae1322`, `ae1325`, `ae1332`, `ae1337`, `ae1341`, `ae1345`, `ae1347`, `ae1354`, `ae1357`, `ae1358`, `ae1361`, `ae1362`, `ae1370`, `ae1372`, `ae1373`, `ae1378`, `ae1383`, `ae1385`, `ae1386`, `ae1387`, `ae1391`, `ae1394`, `ae1397`, `ae1398`, `ae1403`, `ae1404`, `ae1405`, `ae1406`, `ae1428`, `ae143`, `ae1436`, `ae1441`, `ae1443`, `ae1445`, `ae1447`, `ae145`, `ae1451`, `ae1454`, `ae146`, `ae1460`, `ae1462`, `ae1464`, `ae1466`, `ae1468`, `ae1469`, `ae1470`, `ae1471`, `ae1472`, `ae1478`, `ae1479`, `ae1489`, `ae149`, `ae1490`, `ae1494`, `ae1498`, `ae1499`, `ae1505`, `ae1509`, `ae151`, `ae1523`, `ae1525`, `ae1529`, `ae1540`, `ae1544`, `ae1545`, `ae1549`, `ae1552`, `ae1553`, `ae1571`, `ae1573`, `ae1574`, `ae1578`, `ae1585`, `ae1586`, `ae1589`, `ae1590`, `ae1593`, `ae1594`, `ae1617`, `ae1619`, `ae1622`, `ae1635`, `ae1636`, `ae1639`, `ae1645`, `ae1647`, `ae165`, `ae1650`, `ae1652`, `ae1654`, `ae1656`, `ae1657`, `ae1659`, `ae166`, `ae1668`, `ae1671`, `ae1673`, `ae1682`, `ae1687`, `ae1688`, `ae1689`, `ae1690`, `ae1691`, `ae1696`, `ae1699`, `ae1701`, `ae1702`, `ae1703`, `ae1704`, `ae1707`, `ae1708`, `ae1710`, `ae1711`, `ae1713`, `ae1716`, `ae1717`, `ae1719`, `ae1721`, `ae1723`, `ae1724`, `ae1727`, `ae1731`, `ae1737`, `ae1738`, `ae1739`, `ae174`, `ae1740`, `ae1744`, `ae1749`, `ae175`, `ae1754`, `ae1755`, `ae1758`, `ae1759`, `ae1763`, `ae1764`, `ae1765`, `ae1768`, `ae1773`, `ae1775`, `ae1776`, `ae1783`, `ae179`, `ae1790`, `ae1792`, `ae1795`, `ae1796`, `ae1798`, `ae1800`, `ae1816`, `ae182`, `ae1821`, `ae1822`, `ae1823`, `ae1827`, `ae183`, `ae1832`, `ae1837`, `ae1838`, `ae1840`, `ae1850`, `ae1852`, `ae1855`, `ae1856`, `ae1857`, `ae186`, `ae1861`, `ae1869`, `ae1873`, `ae188`, `ae1889`, `ae1891`, `ae1895`, `ae1896`, `ae19`, `ae1902`, `ae1903`, `ae1905`, `ae191`, `ae1913`, `ae1914`, `ae1919`, `ae192`, `ae1924`, `ae1933`, `ae194`, `ae1940`, `ae1943`, `ae1945`, `ae1948`, `ae1954`, `ae1955`, `ae1956`, `ae1959`, `ae1963`, `ae1969`, `ae1976`, `ae1977`, `ae1979`, `ae1980`, `ae1985`, `ae200`, `ae2003`, `ae2010`, `ae2013`, `ae2018`, `ae2019`, `ae2023`, `ae2030`, `ae2033`, `ae2037`, `ae2039`, `ae2042`, `ae2043`, `ae2051`, `ae206`, `ae2065`, `ae2067`, `ae2068`, `ae2073`, `ae2081`, `ae2082`, `ae2086`, `ae209`, `ae2091`, `ae2103`, `ae2105`, `ae2107`, `ae2109`, `ae2110`, `ae2113`, `ae2115`, `ae2118`, `ae2122`, `ae2123`, `ae2125`, `ae2127`, `ae2129`, `ae2133`, `ae2135`, `ae2136`, `ae2137`, `ae214`, `ae2140`, `ae2141`, `ae2144`, `ae2146`, `ae2148`, `ae2149`, `ae215`, `ae2152`, `ae2156`, `ae2158`, `ae216`, `ae2160`, `ae2163`, `ae2164`, `ae2170`, `ae2173`, `ae2176`, `ae2181`, `ae2183`, `ae2184`, `ae2186`, `ae2187`, `ae2188`, `ae2189`, `ae2190`, `ae2192`, `ae2193`, `ae2194`, `ae2196`, `ae2197`, `ae2198`, `ae22`, `ae220`, `ae2200`, `ae2204`, `ae2208`, `ae221`, `ae2211`, `ae2213`, `ae2217`, `ae2218`, `ae2229`, `ae2249`, `ae225`, `ae2252`, `ae2253`, `ae2255`, `ae2259`, `ae2261`, `ae2263`, `ae2265`, `ae2268`, `ae227`, `ae2270`, `ae2271`, `ae2272`, `ae2274`, `ae2279`, `ae2280`, `ae2281`, `ae2282`, `ae2283`, `ae2289`, `ae2292`, `ae2293`, `ae2298`, `ae23`, `ae2304`, `ae2306`, `ae2316`, `ae2319`, `ae2324`, `ae2325`, `ae2328`, `ae2337`, `ae234`, `ae2340`, `ae2343`, `ae2348`, `ae2349`, `ae2350`, `ae2355`, `ae2357`, `ae2359`, `ae2360`, `ae2361`, `ae2362`, `ae2363`, `ae2364`, `ae2365`, `ae2366`, `ae2367`, `ae237`, `ae2370`, `ae2371`, `ae2373`, `ae2374`, `ae2375`, `ae2376`, `ae2378`, `ae2379`, `ae238`, `ae2380`, `ae2381`, `ae2385`, `ae2387`, `ae2389`, `ae239`, `ae2393`, `ae2394`, `ae240`, `ae2400`, `ae2403`, `ae2405`, `ae2409`, `ae241`, `ae2411`, `ae2416`, `ae2417`, `ae2420`, `ae2422`, `ae2423`, `ae2424`, `ae2425`, `ae2427`, `ae2428`, `ae2429`, `ae2430`, `ae2432`, `ae2438`, `ae2439`, `ae244`, `ae2444`, `ae2445`, `ae2447`, `ae2448`, `ae2449`, `ae245`, `ae2453`, `ae2455`, `ae2457`, `ae2458`, `ae246`, `ae2462`, `ae2465`, `ae2466`, `ae2467`, `ae247`, `ae248`, `ae2483`, `ae2487`, `ae2488`, `ae249`, `ae2496`, `ae2498`, `ae2499`, `ae250`, `ae2500`, `ae2502`, `ae2503`, `ae2504`, `ae2505`, `ae2506`, `ae2507`, `ae2508`, `ae251`, `ae252`, `ae2525`, `ae2528`, `ae2529`, `ae253`, `ae2532`, `ae2533`, `ae2535`, `ae2536`, `ae2538`, `ae2539`, `ae254`, `ae2540`, `ae2541`, `ae2542`, `ae2543`, `ae2545`, `ae2546`, `ae255`, `ae2551`, `ae2555`, `ae2561`, `ae2569`, `ae2570`, `ae2572`, `ae2577`, `ae2586`, `ae2589`, `ae259`, `ae2597`, `ae2601`, `ae2604`, `ae2605`, `ae2608`, `ae2609`, `ae2612`, `ae2613`, `ae2615`, `ae2616`, `ae2617`, `ae2618`, `ae262`, `ae2621`, `ae2622`, `ae2625`, `ae2626`, `ae2627`, `ae2628`, `ae2629`, `ae263`, `ae2632`, `ae2634`, `ae2635`, `ae2637`, `ae2639`, `ae2640`, `ae2642`, `ae2643`, `ae2645`, `ae2646`, `ae2647`, `ae2648`, `ae2649`, `ae265`, `ae2650`, `ae2652`, `ae2653`, `ae2655`, `ae2658`, `ae2659`, `ae2666`, `ae2667`, `ae2668`, `ae2669`, `ae2670`, `ae2672`, `ae2673`, `ae2674`, `ae2675`, `ae2676`, `ae2677`, `ae2679`, `ae268`, `ae2680`, `ae2682`, `ae2683`, `ae2684`, `ae2685`, `ae2687`, `ae2688`, `ae2689`, `ae269`, `ae2690`, `ae2691`, `ae2692`, `ae2693`, `ae2696`, `ae2697`, `ae2698`, `ae2699`, `ae27`, `ae270`, `ae2702`, `ae2708`, `ae271`, `ae2714`, `ae2715`, `ae272`, `ae2720`, `ae2723`, `ae2726`, `ae2727`, `ae2728`, `ae273`, `ae2732`, `ae2735`, `ae2737`, `ae274`, `ae2741`, `ae2742`, `ae2743`, `ae2744`, `ae2745`, `ae2746`, `ae2747`, `ae2748`, `ae2749`, `ae2750`, `ae2751`, `ae2753`, `ae2754`, `ae2755`, `ae2756`, `ae2757`, `ae2760`, `ae2761`, `ae2762`, `ae2763`, `ae2764`, `ae2766`, `ae2769`, `ae2770`, `ae2774`, `ae278`, `ae2785`, `ae279`, `ae2796`, `ae28`, `ae2806`, `ae2807`, `ae2809`, `ae2810`, `ae282`, `ae2823`, `ae2824`, `ae2829`, `ae283`, `ae2832`, `ae2833`, `ae2838`, `ae284`, `ae2841`, `ae2848`, `ae2849`, `ae285`, `ae2858`, `ae286`, `ae2867`, `ae287`, `ae2871`, `ae2875`, `ae2878`, `ae288`, `ae2881`, `ae2883`, `ae2884`, `ae2889`, `ae2891`, `ae29`, `ae2900`, `ae2903`, `ae2904`, `ae2905`, `ae2908`, `ae2909`, `ae291`, `ae2915`, `ae2918`, `ae292`, `ae2924`, `ae2925`, `ae2929`, `ae2934`, `ae2935`, `ae2936`, `ae2938`, `ae2943`, `ae2947`, `ae2948`, `ae2950`, `ae2952`, `ae2953`, `ae2955`, `ae2956`, `ae2957`, `ae296`, `ae2961`, `ae2963`, `ae2965`, `ae2966`, `ae2967`, `ae2969`, `ae297`, `ae2970`, `ae2971`, `ae2972`, `ae2973`, `ae2974`, `ae2975`, `ae2976`, `ae2978`, `ae298`, `ae2980`, `ae2981`, `ae2982`, `ae2983`, `ae2987`, `ae2990`, `ae2991`, `ae2995`, `ae2996`, `ae2998`, `ae2999`, `ae3`, `ae30`, `ae3007`, `ae3009`, `ae301`, `ae3010`, `ae3011`, `ae3012`, `ae3014`, `ae3015`, `ae3018`, `ae3020`, `ae3022`, `ae3023`, `ae3024`, `ae3031`, `ae3034`, `ae3037`, `ae3042`, `ae3043`, `ae3046`, `ae3047`, `ae3049`, `ae3050`, `ae3052`, `ae3053`, `ae3058`, `ae3059`, `ae3073`, `ae308`, `ae3080`, `ae3081`, `ae3085`, `ae3088`, `ae3097`, `ae3100`, `ae3101`, `ae3102`, `ae3103`, `ae3105`, `ae3106`, `ae3108`, `ae3109`, `ae311`, `ae3110`, `ae3112`, `ae3114`, `ae3115`, `ae3116`, `ae3117`, `ae3118`, `ae3119`, `ae312`, `ae3120`, `ae3121`, `ae3124`, `ae3125`, `ae3126`, `ae3127`, `ae3128`, `ae3134`, `ae3135`, `ae3136`, `ae3138`, `ae3141`, `ae3142`, `ae3143`, `ae3144`, `ae3145`, `ae3147`, `ae3148`, `ae315`, `ae3150`, `ae3151`, `ae3153`, `ae3154`, `ae3155`, `ae3156`, `ae3157`, `ae3162`, `ae3166`, `ae3168`, `ae317`, `ae3170`, `ae3171`, `ae3174`, `ae3175`, `ae3176`, `ae3177`, `ae3178`, `ae3186`, `ae3188`, `ae3189`, `ae319`, `ae3190`, `ae3191`, `ae3193`, `ae3197`, `ae3201`, `ae3202`, `ae321`, `ae3217`, `ae3220`, `ae3221`, `ae3228`, `ae3232`, `ae3235`, `ae324`, `ae3240`, `ae3242`, `ae3243`, `ae3245`, `ae3246`, `ae3251`, `ae3252`, `ae3253`, `ae3254`, `ae3255`, `ae3256`, `ae3257`, `ae3258`, `ae326`, `ae3260`, `ae3261`, `ae3262`, `ae3263`, `ae3264`, `ae3265`, `ae3266`, `ae3267`, `ae3268`, `ae3269`, `ae327`, `ae3270`, `ae3277`, `ae3280`, `ae3282`, `ae3286`, `ae3287`, `ae3288`, `ae329`, `ae3291`, `ae3292`, `ae3294`, `ae3295`, `ae3296`, `ae3297`, `ae3298`, `ae3299`, `ae33`, `ae330`, `ae3301`, `ae3302`, `ae3305`, `ae3306`, `ae331`, `ae3311`, `ae3312`, `ae3316`, `ae3317`, `ae3318`, `ae3319`, `ae3328`, `ae333`, `ae3330`, `ae3331`, `ae3332`, `ae3334`, `ae3335`, `ae3336`, `ae334`, `ae3342`, `ae3344`, `ae3346`, `ae335`, `ae3350`, `ae3351`, `ae3354`, `ae3357`, `ae3360`, `ae3365`, `ae3369`, `ae3372`, `ae3380`, `ae3381`, `ae3383`, `ae3384`, `ae3386`, `ae3391`, `ae34`, `ae340`, `ae3401`, `ae3402`, `ae3403`, `ae3404`, `ae3405`, `ae3406`, `ae3407`, `ae3408`, `ae3409`, `ae3410`, `ae3411`, `ae3412`, `ae3413`, `ae3414`, `ae3415`, `ae3416`, `ae3417`, `ae3419`, `ae3420`, `ae3421`, `ae3422`, `ae3427`, `ae3428`, `ae3429`, `ae3430`, `ae3431`, `ae3432`, `ae3433`, `ae3434`, `ae3435`, `ae3440`, `ae3441`, `ae3442`, `ae3443`, `ae3444`, `ae3445`, `ae3446`, `ae3447`, `ae3448`, `ae3449`, `ae3451`, `ae3455`, `ae3456`, `ae3473`, `ae3483`, `ae3486`, `ae3488`, `ae3489`, `ae349`, `ae3492`, `ae3496`, `ae3497`, `ae3499`, `ae35`, `ae3501`, `ae3502`, `ae3504`, `ae3505`, `ae3508`, `ae3514`, `ae3516`, `ae3518`, `ae3519`, `ae3520`, `ae3521`, `ae3524`, `ae3530`, `ae3532`, `ae3534`, `ae3538`, `ae354`, `ae3543`, `ae3544`, `ae3545`, `ae355`, `ae3558`, `ae3561`, `ae3564`, `ae3568`, `ae3571`, `ae3573`, `ae3574`, `ae3575`, `ae3576`, `ae3579`, `ae3582`, `ae3594`, `ae3595`, `ae3596`, `ae3597`, `ae3599`, `ae3602`, `ae3604`, `ae3606`, `ae3607`, `ae3609`, `ae3610`, `ae3611`, `ae3612`, `ae3613`, `ae3614`, `ae3615`, `ae3616`, `ae3617`, `ae3618`, `ae3619`, `ae362`, `ae3620`, `ae3627`, `ae3629`, `ae363`, `ae3637`, `ae3638`, `ae3641`, `ae3643`, `ae3645`, `ae3646`, `ae3647`, `ae3654`, `ae366`, `ae3660`, `ae3662`, `ae3673`, `ae3674`, `ae3675`, `ae3676`, `ae3678`, `ae3679`, `ae3680`, `ae3681`, `ae3682`, `ae3687`, `ae3689`, `ae3690`, `ae3691`, `ae3693`, `ae3694`, `ae3697`, `ae3699`, `ae37`, `ae370`, `ae3701`, `ae3702`, `ae3703`, `ae3709`, `ae3717`, `ae3718`, `ae3720`, `ae3722`, `ae3724`, `ae3727`, `ae3729`, `ae3731`, `ae3733`, `ae3734`, `ae3738`, `ae3740`, `ae3741`, `ae3746`, `ae3747`, `ae3753`, `ae3754`, `ae3755`, `ae3756`, `ae3757`, `ae3759`, `ae376`, `ae3760`, `ae3762`, `ae3763`, `ae3764`, `ae3767`, `ae3769`, `ae3770`, `ae3771`, `ae3772`, `ae3773`, `ae3774`, `ae3775`, `ae3776`, `ae3777`, `ae3778`, `ae3780`, `ae3781`, `ae3782`, `ae3783`, `ae3784`, `ae3785`, `ae3786`, `ae3787`, `ae3788`, `ae3789`, `ae379`, `ae3790`, `ae3791`, `ae3792`, `ae3793`, `ae3794`, `ae3795`, `ae3796`, `ae3797`, `ae3800`, `ae3805`, `ae3806`, `ae3807`, `ae3809`, `ae3816`, `ae3817`, `ae3820`, `ae3822`, `ae3826`, `ae3828`, `ae3829`, `ae3833`, `ae3834`, `ae3835`, `ae3837`, `ae3838`, `ae3839`, `ae3840`, `ae3843`, `ae3844`, `ae3845`, `ae3846`, `ae3847`, `ae3848`, `ae3851`, `ae3852`, `ae3855`, `ae3856`, `ae3857`, `ae3866`, `ae3868`, `ae3869`, `ae3876`, `ae3877`, `ae3878`, `ae388`, `ae3880`, `ae3881`, `ae3882`, `ae3884`, `ae3885`, `ae3886`, `ae3887`, `ae3888`, `ae3889`, `ae3890`, `ae3891`, `ae3892`, `ae3893`, `ae3894`, `ae3895`, `ae3896`, `ae3899`, `ae3900`, `ae3901`, `ae3903`, `ae3904`, `ae3905`, `ae3906`, `ae3907`, `ae3909`, `ae3910`, `ae3914`, `ae3916`, `ae3917`, `ae3918`, `ae3919`, `ae3920`, `ae3921`, `ae3922`, `ae3924`, `ae3935`, `ae3936`, `ae394`, `ae3946`, `ae3953`, `ae3956`, `ae3957`, `ae3962`, `ae397`, `ae3979`, `ae398`, `ae3982`, `ae3986`, `ae399`, `ae3994`, `ae4`, `ae40`, `ae400`, `ae4000`, `ae4001`, `ae4002`, `ae4003`, `ae4004`, `ae4005`, `ae4006`, `ae4007`, `ae4008`, `ae4009`, `ae401`, `ae4010`, `ae4011`, `ae4012`, `ae4015`, `ae4016`, `ae402`, `ae4021`, `ae4022`, `ae4029`, `ae403`, `ae4031`, `ae4037`, `ae404`, `ae4045`, `ae4048`, `ae4049`, `ae405`, `ae4050`, `ae407`, `ae408`, `ae410`, `ae4100`, `ae4101`, `ae4102`, `ae4103`, `ae4104`, `ae4105`, `ae4106`, `ae4108`, `ae4109`, `ae4110`, `ae4111`, `ae4112`, `ae4113`, `ae4114`, `ae4115`, `ae4116`, `ae4117`, `ae4118`, `ae4119`, `ae4120`, `ae4121`, `ae4122`, `ae4123`, `ae4124`, `ae4125`, `ae4126`, `ae4127`, `ae4128`, `ae4129`, `ae413`, `ae4130`, `ae4131`, `ae4132`, `ae4133`, `ae4134`, `ae4135`, `ae4136`, `ae4137`, `ae4138`, `ae4139`, `ae414`, `ae4141`, `ae4143`, `ae4145`, `ae4146`, `ae4151`, `ae4157`, `ae416`, `ae4161`, `ae4162`, `ae4163`, `ae4164`, `ae4165`, `ae4166`, `ae4167`, `ae4168`, `ae4169`, `ae4170`, `ae4171`, `ae4172`, `ae4173`, `ae4174`, `ae4175`, `ae4176`, `ae4177`, `ae4178`, `ae4179`, `ae4181`, `ae4182`, `ae4183`, `ae4186`, `ae4187`, `ae4188`, `ae4189`, `ae4190`, `ae4191`, `ae4192`, `ae4193`, `ae4194`, `ae4195`, `ae4196`, `ae4198`, `ae4199`, `ae42`, `ae420`, `ae4200`, `ae4201`, `ae4203`, `ae4204`, `ae4205`, `ae4206`, `ae4208`, `ae4209`, `ae421`, `ae4211`, `ae4212`, `ae4215`, `ae4216`, `ae4218`, `ae4219`, `ae422`, `ae4220`, `ae4221`, `ae4222`, `ae4227`, `ae4228`, `ae4229`, `ae423`, `ae4231`, `ae4234`, `ae4235`, `ae4236`, `ae4237`, `ae4243`, `ae4244`, `ae4245`, `ae4247`, `ae4248`, `ae4249`, `ae4250`, `ae4251`, `ae4252`, `ae4253`, `ae4254`, `ae4255`, `ae4256`, `ae4257`, `ae4259`, `ae4260`, `ae4261`, `ae4264`, `ae4265`, `ae4266`, `ae4267`, `ae4268`, `ae4270`, `ae4271`, `ae4274`, `ae4275`, `ae4276`, `ae4278`, `ae4279`, `ae428`, `ae4281`, `ae4282`, `ae4283`, `ae4284`, `ae4285`, `ae4286`, `ae4289`, `ae4293`, `ae4295`, `ae4296`, `ae4298`, `ae4299`, `ae43`, `ae430`, `ae4300`, `ae4305`, `ae4306`, `ae4307`, `ae4308`, `ae4309`, `ae4310`, `ae4313`, `ae4314`, `ae4315`, `ae4316`, `ae4317`, `ae4318`, `ae4320`, `ae4321`, `ae4322`, `ae4323`, `ae4324`, `ae4325`, `ae4326`, `ae4327`, `ae4328`, `ae4329`, `ae4330`, `ae4331`, `ae4332`, `ae4333`, `ae4334`, `ae4335`, `ae4337`, `ae4339`, `ae4340`, `ae4342`, `ae435`, `ae4351`, `ae4353`, `ae4354`, `ae4355`, `ae4356`, `ae4357`, `ae4359`, `ae4360`, `ae4361`, `ae4362`, `ae4363`, `ae4365`, `ae4366`, `ae4367`, `ae4368`, `ae4369`, `ae4370`, `ae4371`, `ae4373`, `ae4374`, `ae4375`, `ae4376`, `ae4377`, `ae4378`, `ae4379`, `ae438`, `ae4380`, `ae4382`, `ae4383`, `ae4384`, `ae4385`, `ae4386`, `ae4387`, `ae4388`, `ae4389`, `ae439`, `ae4390`, `ae4395`, `ae4396`, `ae4398`, `ae44`, `ae4404`, `ae4408`, `ae441`, `ae4410`, `ae4411`, `ae4412`, `ae4413`, `ae4414`, `ae4415`, `ae4416`, `ae4417`, `ae4418`, `ae4420`, `ae4421`, `ae4422`, `ae4423`, `ae4424`, `ae4425`, `ae4426`, `ae4427`, `ae4429`, `ae4430`, `ae4431`, `ae4432`, `ae4433`, `ae4434`, `ae4435`, `ae4436`, `ae4437`, `ae4438`, `ae4439`, `ae4440`, `ae4441`, `ae4442`, `ae4443`, `ae4444`, `ae4445`, `ae4446`, `ae4447`, `ae4448`, `ae4449`, `ae4450`, `ae4451`, `ae4452`, `ae4453`, `ae4454`, `ae4455`, `ae4456`, `ae4457`, `ae4458`, `ae4459`, `ae446`, `ae4463`, `ae4469`, `ae4470`, `ae4471`, `ae4472`, `ae4473`, `ae4474`, `ae4475`, `ae4476`, `ae4477`, `ae4478`, `ae4479`, `ae448`, `ae4480`, `ae4481`, `ae4482`, `ae4483`, `ae4484`, `ae4485`, `ae4486`, `ae4487`, `ae4488`, `ae4489`, `ae4490`, `ae4491`, `ae4493`, `ae4495`, `ae4496`, `ae4497`, `ae4498`, `ae45`, `ae4500`, `ae4501`, `ae4502`, `ae4512`, `ae4513`, `ae4514`, `ae4515`, `ae4517`, `ae4518`, `ae4519`, `ae4520`, `ae4521`, `ae4523`, `ae4525`, `ae4526`, `ae4527`, `ae4528`, `ae4529`, `ae4530`, `ae4532`, `ae4534`, `ae4535`, `ae4536`, `ae4539`, `ae4540`, `ae4541`, `ae4542`, `ae4543`, `ae4544`, `ae4545`, `ae4546`, `ae4547`, `ae4548`, `ae4549`, `ae4550`, `ae4551`, `ae4552`, `ae4553`, `ae4554`, `ae4555`, `ae4556`, `ae4558`, `ae4559`, `ae4562`, `ae4563`, `ae4567`, `ae4569`, `ae4571`, `ae4572`, `ae4573`, `ae4575`, `ae4576`, `ae4577`, `ae4578`, `ae4579`, `ae458`, `ae4580`, `ae4581`, `ae4582`, `ae4583`, `ae4584`, `ae4585`, `ae4588`, `ae4589`, `ae459`, `ae4590`, `ae4591`, `ae4592`, `ae4593`, `ae4594`, `ae4595`, `ae4596`, `ae4597`, `ae4598`, `ae4599`, `ae4600`, `ae4601`, `ae4602`, `ae4603`, `ae4604`, `ae4605`, `ae4606`, `ae4607`, `ae4609`, `ae4610`, `ae462`, `ae4621`, `ae4622`, `ae4631`, `ae4632`, `ae4634`, `ae4635`, `ae4637`, `ae4638`, `ae4639`, `ae4640`, `ae4641`, `ae4642`, `ae4643`, `ae4644`, `ae4645`, `ae4646`, `ae4647`, `ae4648`, `ae4649`, `ae465`, `ae4650`, `ae4651`, `ae4652`, `ae4653`, `ae4654`, `ae4655`, `ae4656`, `ae4658`, `ae4659`, `ae466`, `ae4662`, `ae4664`, `ae4665`, `ae4666`, `ae4669`, `ae467`, `ae4670`, `ae4671`, `ae4672`, `ae4674`, `ae4675`, `ae4676`, `ae4677`, `ae4679`, `ae468`, `ae4681`, `ae4682`, `ae4683`, `ae4684`, `ae4685`, `ae4686`, `ae4688`, `ae4689`, `ae469`, `ae4690`, `ae4692`, `ae4693`, `ae4696`, `ae4697`, `ae4699`, `ae4702`, `ae4703`, `ae4704`, `ae4706`, `ae4707`, `ae4709`, `ae4710`, `ae4711`, `ae4712`, `ae4713`, `ae4714`, `ae4715`, `ae4717`, `ae4718`, `ae4720`, `ae4721`, `ae4722`, `ae4723`, `ae4725`, `ae4727`, `ae4728`, `ae4729`, `ae473`, `ae4730`, `ae4732`, `ae4733`, `ae4735`, `ae4736`, `ae4738`, `ae4739`, `ae4740`, `ae4741`, `ae4742`, `ae4744`, `ae4745`, `ae4746`, `ae4747`, `ae4749`, `ae475`, `ae4750`, `ae4756`, `ae4759`, `ae476`, `ae4762`, `ae4763`, `ae4764`, `ae4765`, `ae4767`, `ae4768`, `ae4769`, `ae4770`, `ae4771`, `ae4772`, `ae4773`, `ae4775`, `ae4776`, `ae4777`, `ae4779`, `ae478`, `ae4780`, `ae4781`, `ae4785`, `ae4788`, `ae4789`, `ae479`, `ae4793`, `ae4795`, `ae4796`, `ae4798`, `ae48`, `ae4800`, `ae4801`, `ae4802`, `ae4803`, `ae4804`, `ae4805`, `ae4806`, `ae482`, `ae483`, `ae485`, `ae486`, `ae487`, `ae489`, `ae49`, `ae491`, `ae494`, `ae495`, `ae496`, `ae498`, `ae5`, `ae503`, `ae507`, `ae510`, `ae511`, `ae514`, `ae516`, `ae520`, `ae522`, `ae525`, `ae528`, `ae530`, `ae531`, `ae54`, `ae540`, `ae542`, `ae546`, `ae549`, `ae551`, `ae552`, `ae555`, `ae557`, `ae561`, `ae562`, `ae564`, `ae565`, `ae57`, `ae572`, `ae573`, `ae574`, `ae583`, `ae585`, `ae59`, `ae591`, `ae593`, `ae599`, `ae6`, `ae60`, `ae609`, `ae613`, `ae615`, `ae624`, `ae627`, `ae635`, `ae637`, `ae641`, `ae643`, `ae644`, `ae648`, `ae65`, `ae655`, `ae660`, `ae663`, `ae665`, `ae67`, `ae670`, `ae671`, `ae679`, `ae68`, `ae681`, `ae684`, `ae685`, `ae69`, `ae698`, `ae699`, `ae706`, `ae707`, `ae71`, `ae72`, `ae723`, `ae725`, `ae729`, `ae737`, `ae738`, `ae74`, `ae747`, `ae75`, `ae751`, `ae752`, `ae763`, `ae764`, `ae765`, `ae768`, `ae77`, `ae779`, `ae78`, `ae781`, `ae79`, `ae793`, `ae797`, `ae8`, `ae808`, `ae81`, `ae813`, `ae815`, `ae820`, `ae821`, `ae826`, `ae830`, `ae832`, `ae837`, `ae838`, `ae839`, `ae84`, `ae843`, `ae850`, `ae852`, `ae855`, `ae857`, `ae86`, `ae861`, `ae866`, `ae873`, `ae875`, `ae88`, `ae884`, `ae885`, `ae888`, `ae89`, `ae890`, `ae897`, `ae900`, `ae905`, `ae906`, `ae909`, `ae916`, `ae919`, `ae92`, `ae922`, `ae930`, `ae932`, `ae939`, `ae942`, `ae943`, `ae947`, `ae96`, `ae960`, `ae969`, `ae97`, `ae972`, `ae978`, `ae98`, `ae981`, `ae982`, `ae986`, `ae987`, `ae989`, `ae990`, `ae992`, `ae994`
[1] 21835 631
# match up with meta data of RNAseq experiment
aernas2_counts_raw_qc_umicorr_annotFilt <- aernas2_counts_raw_qc_umicorr_annot %>%
drop_na(chr) %>% # remove rows that have no information of start, end, chromosome and/or strand
dplyr::select(1:9, one_of(sort(as.character(AEDB.CEA.sampleList)))) # select gene expression of only patients in RNA-seq AE df, sort in same order as metadata study_numberdrop_na: no rows removedWarning: Unknown columns: `ae1`, `ae100`, `ae1001`, `ae1004`, `ae1010`, `ae1011`, `ae1012`, `ae1015`, `ae1017`, `ae1018`, `ae1019`, `ae102`, `ae1022`, `ae1025`, `ae1026`, `ae1029`, `ae103`, `ae1030`, `ae1032`, `ae1033`, `ae104`, `ae1041`, `ae1045`, `ae1048`, `ae1049`, `ae1053`, `ae1054`, `ae1055`, `ae1057`, `ae1058`, `ae106`, `ae1065`, `ae1066`, `ae1068`, `ae107`, `ae1071`, `ae1074`, `ae108`, `ae1080`, `ae1082`, `ae1085`, `ae1086`, `ae1088`, `ae109`, `ae1095`, `ae11`, `ae110`, `ae1100`, `ae1106`, `ae1108`, `ae111`, `ae1111`, `ae1113`, `ae112`, `ae1125`, `ae1126`, `ae113`, `ae1132`, `ae1133`, `ae1135`, `ae1139`, `ae114`, `ae1140`, `ae1145`, `ae115`, `ae1151`, `ae1153`, `ae1157`, `ae116`, `ae1160`, `ae1161`, `ae1162`, `ae1163`, `ae1166`, `ae1167`, `ae1169`, `ae117`, `ae1173`, `ae1174`, `ae1178`, `ae1179`, `ae1181`, `ae1182`, `ae1184`, `ae1185`, `ae1186`, `ae1188`, `ae1189`, `ae119`, `ae1190`, `ae1191`, `ae1193`, `ae1194`, `ae1197`, `ae1198`, `ae1199`, `ae12`, `ae120`, `ae1203`, `ae1205`, `ae1206`, `ae121`, `ae1210`, `ae1217`, `ae1218`, `ae1219`, `ae1221`, `ae1222`, `ae1224`, `ae1227`, `ae1229`, `ae123`, `ae1230`, `ae1231`, `ae1233`, `ae1237`, `ae1238`, `ae1239`, `ae124`, `ae1242`, `ae1243`, `ae1248`, `ae1249`, `ae125`, `ae1250`, `ae1253`, `ae1257`, `ae1258`, `ae126`, `ae1260`, `ae1262`, `ae1263`, `ae1265`, `ae1267`, `ae1268`, `ae127`, `ae1272`, `ae1282`, `ae1285`, `ae1286`, `ae1287`, `ae1288`, `ae129`, `ae1293`, `ae1294`, `ae1296`, `ae1297`, `ae1299`, `ae13`, `ae130`, `ae1300`, `ae1301`, `ae1302`, `ae1303`, `ae1305`, `ae1306`, `ae131`, `ae1311`, `ae1317`, `ae1319`, `ae1320`, `ae1321`, `ae1323`, `ae1324`, `ae1325`, `ae1326`, `ae1329`, `ae133`, `ae1331`, `ae1332`, `ae1334`, `ae1335`, `ae1337`, `ae1341`, `ae1342`, `ae1343`, `ae1344`, `ae1345`, `ae1347`, `ae1349`, `ae135`, `ae1352`, `ae1354`, `ae1357`, `ae1358`, `ae1359`, `ae136`, `ae1361`, `ae1362`, `ae1367`, `ae1369`, `ae137`, `ae1370`, `ae1372`, `ae1373`, `ae1374`, `ae1375`, `ae1376`, `ae1378`, `ae138`, `ae1380`, `ae1383`, `ae1385`, `ae1386`, `ae1387`, `ae1389`, `ae1390`, `ae1391`, `ae1394`, `ae1397`, `ae1398`, `ae140`, `ae1402`, `ae1403`, `ae1404`, `ae1407`, `ae1408`, `ae1409`, `ae1410`, `ae1411`, `ae1412`, `ae1413`, `ae1414`, `ae1420`, `ae1421`, `ae1422`, `ae1425`, `ae1426`, `ae1427`, `ae1429`, `ae1431`, `ae1439`, `ae144`, `ae1441`, `ae1443`, `ae1445`, `ae1446`, `ae1448`, `ae1449`, `ae1451`, `ae1452`, `ae1453`, `ae1454`, `ae146`, `ae1460`, `ae1461`, `ae1462`, `ae1466`, `ae1469`, `ae147`, `ae1482`, `ae1483`, `ae1484`, `ae1487`, `ae149`, `ae1494`, `ae1495`, `ae1497`, `ae1499`, `ae15`, `ae150`, `ae1500`, `ae1504`, `ae1505`, `ae1506`, `ae1509`, `ae151`, `ae1510`, `ae1514`, `ae1517`, `ae1519`, `ae1523`, `ae1524`, `ae1525`, `ae1527`, `ae1528`, `ae1529`, `ae1530`, `ae1531`, `ae1532`, `ae1534`, `ae1540`, `ae1544`, `ae1545`, `ae1547`, `ae1548`, `ae1549`, `ae1550`, `ae1552`, `ae1555`, `ae1558`, `ae1565`, `ae1567`, `ae1568`, `ae1571`, `ae1574`, `ae1576`, `ae1578`, `ae158`, `ae1585`, `ae1589`, `ae1593`, `ae1594`, `ae1597`, `ae1598`, `ae16`, `ae160`, `ae1601`, `ae1602`, `ae1604`, `ae1607`, `ae161`, `ae1616`, `ae1618`, `ae1622`, `ae1629`, `ae1630`, `ae1634`, `ae1635`, `ae1639`, `ae164`, `ae1642`, `ae1643`, `ae1648`, `ae165`, `ae1651`, `ae1652`, `ae1655`, `ae1656`, `ae1658`, `ae166`, `ae1661`, `ae1663`, `ae1668`, `ae1669`, `ae1671`, `ae1672`, `ae1673`, `ae1675`, `ae1676`, `ae1679`, `ae1682`, `ae1683`, `ae1685`, `ae1687`, `ae1688`, `ae1689`, `ae1692`, `ae170`, `ae1702`, `ae1703`, `ae1704`, `ae1705`, `ae1706`, `ae171`, `ae1710`, `ae1711`, `ae1712`, `ae1713`, `ae1714`, `ae1716`, `ae1717`, `ae1718`, `ae1719`, `ae172`, `ae1722`, `ae1724`, `ae1725`, `ae1727`, `ae1728`, `ae173`, `ae1731`, `ae1732`, `ae1735`, `ae1737`, `ae1739`, `ae174`, `ae1743`, `ae1744`, `ae1745`, `ae1747`, `ae1748`, `ae175`, `ae1750`, `ae1751`, `ae1754`, `ae1755`, `ae1756`, `ae176`, `ae1762`, `ae1763`, `ae1764`, `ae1765`, `ae1766`, `ae1770`, `ae1771`, `ae1773`, `ae1774`, `ae1775`, `ae1776`, `ae1778`, `ae178`, `ae1780`, `ae1781`, `ae1787`, `ae1788`, `ae1789`, `ae179`, `ae1793`, `ae1794`, `ae1795`, `ae1798`, `ae18`, `ae180`, `ae1800`, `ae1811`, `ae1815`, `ae1816`, `ae1818`, `ae182`, `ae1822`, `ae1823`, `ae1827`, `ae183`, `ae1831`, `ae1832`, `ae1836`, `ae1837`, `ae1838`, `ae1840`, `ae185`, `ae1852`, `ae1855`, `ae1857`, `ae1858`, `ae186`, `ae1861`, `ae1868`, `ae1870`, `ae1872`, `ae1873`, `ae1878`, `ae1879`, `ae188`, `ae1884`, `ae1889`, `ae189`, `ae1891`, `ae1897`, `ae1898`, `ae19`, `ae1902`, `ae1905`, `ae1907`, `ae1909`, `ae191`, `ae1910`, `ae1914`, `ae1915`, `ae1917`, `ae1918`, `ae1919`, `ae192`, `ae1925`, `ae1926`, `ae1928`, `ae1931`, `ae1932`, `ae1935`, `ae1938`, `ae1940`, `ae1945`, `ae1949`, `ae195`, `ae1950`, `ae1954`, `ae1956`, `ae196`, `ae1962`, `ae1963`, `ae1964`, `ae1968`, `ae197`, `ae1979`, `ae1980`, `ae1982`, `ae1985`, `ae1986`, `ae1987`, `ae1989`, `ae199`, `ae1990`, `ae2`, `ae200`, `ae2004`, `ae2005`, `ae2007`, `ae2009`, `ae201`, `ae2011`, `ae2013`, `ae2014`, `ae2015`, `ae2016`, `ae2018`, `ae2019`, `ae2023`, `ae2024`, `ae2025`, `ae2027`, `ae203`, `ae2030`, `ae2037`, `ae2039`, `ae204`, `ae2040`, `ae2041`, `ae2042`, `ae205`, `ae2054`, `ae2055`, `ae2061`, `ae2065`, `ae2067`, `ae2068`, `ae207`, `ae2072`, `ae2073`, `ae2077`, `ae208`, `ae2082`, `ae2083`, `ae2086`, `ae2089`, `ae209`, `ae2095`, `ae21`, `ae2102`, `ae2106`, `ae2110`, `ae2117`, `ae2118`, `ae2119`, `ae2120`, `ae2121`, `ae2122`, `ae2123`, `ae2125`, `ae2128`, `ae2129`, `ae213`, `ae2131`, `ae2132`, `ae2133`, `ae2136`, `ae2138`, `ae2139`, `ae214`, `ae2141`, `ae2142`, `ae2143`, `ae2145`, `ae2147`, `ae2149`, `ae215`, `ae2152`, `ae2156`, `ae2157`, `ae2158`, `ae2159`, `ae216`, `ae2160`, `ae2161`, `ae2162`, `ae2163`, `ae2169`, `ae2173`, `ae2175`, `ae2176`, `ae2179`, `ae2181`, `ae2184`, `ae2188`, `ae2189`, `ae2192`, `ae2193`, `ae2194`, `ae2195`, `ae2196`, `ae2197`, `ae2198`, `ae22`, `ae220`, `ae2200`, `ae2204`, `ae2207`, `ae2208`, `ae221`, `ae2210`, `ae2211`, `ae2213`, `ae2217`, `ae2218`, `ae222`, `ae2225`, `ae2227`, `ae2228`, `ae2229`, `ae224`, `ae2249`, `ae225`, `ae2251`, `ae2252`, `ae2254`, `ae2255`, `ae2258`, `ae226`, `ae2260`, `ae2261`, `ae2262`, `ae2263`, `ae2265`, `ae2266`, `ae2268`, `ae2269`, `ae227`, `ae2271`, `ae2274`, `ae2276`, `ae2277`, `ae2278`, `ae2279`, `ae2280`, `ae2281`, `ae2282`, `ae2283`, `ae2287`, `ae2293`, `ae2294`, `ae2295`, `ae2296`, `ae2297`, `ae2298`, `ae2299`, `ae23`, `ae230`, `ae2301`, `ae2305`, `ae2309`, `ae231`, `ae2311`, `ae2314`, `ae2315`, `ae2316`, `ae2318`, `ae2319`, `ae2325`, `ae2327`, `ae2336`, `ae234`, `ae2341`, `ae2342`, `ae2343`, `ae2347`, `ae2348`, `ae2349`, `ae235`, `ae2355`, `ae2356`, `ae2357`, `ae2359`, `ae2360`, `ae2362`, `ae2363`, `ae2364`, `ae2365`, `ae2366`, `ae2367`, `ae2368`, `ae237`, `ae2370`, `ae2371`, `ae2372`, `ae2375`, `ae2376`, `ae2378`, `ae2379`, `ae238`, `ae2380`, `ae2382`, `ae2385`, `ae2388`, `ae239`, `ae2393`, `ae24`, `ae240`, `ae2400`, `ae2402`, `ae2403`, `ae2404`, `ae2409`, `ae241`, `ae2411`, `ae2412`, `ae2414`, `ae2416`, `ae2417`, `ae2418`, `ae2419`, `ae2422`, `ae2423`, `ae2424`, `ae2427`, `ae243`, `ae2430`, `ae2432`, `ae2433`, `ae2435`, `ae2439`, `ae244`, `ae2440`, `ae2444`, `ae2445`, `ae2447`, `ae2449`, `ae245`, `ae2453`, `ae246`, `ae2465`, `ae247`, `ae2470`, `ae2475`, `ae248`, `ae2487`, `ae249`, `ae2491`, `ae2495`, `ae2498`, `ae2499`, `ae250`, `ae2500`, `ae2501`, `ae2502`, `ae2503`, `ae2504`, `ae2505`, `ae2507`, `ae2508`, `ae251`, `ae252`, `ae2520`, `ae2525`, `ae2528`, `ae2529`, `ae253`, `ae2532`, `ae2533`, `ae2535`, `ae2536`, `ae2538`, `ae254`, `ae2542`, `ae2545`, `ae2546`, `ae255`, `ae2551`, `ae2555`, `ae2556`, `ae256`, `ae2561`, `ae2569`, `ae2572`, `ae2577`, `ae2579`, `ae2586`, `ae2589`, `ae259`, `ae2597`, `ae260`, `ae2602`, `ae2603`, `ae2604`, `ae2608`, `ae261`, `ae2610`, `ae2611`, `ae2612`, `ae2613`, `ae2614`, `ae2615`, `ae2617`, `ae2618`, `ae262`, `ae2621`, `ae2625`, `ae2626`, `ae2627`, `ae2628`, `ae2629`, `ae263`, `ae2631`, `ae2633`, `ae2634`, `ae2635`, `ae2636`, `ae2637`, `ae2639`, `ae2640`, `ae2642`, `ae2643`, `ae2646`, `ae2648`, `ae2649`, `ae265`, `ae2650`, `ae2652`, `ae2653`, `ae2654`, `ae2658`, `ae2659`, `ae266`, `ae2666`, `ae2667`, `ae2669`, `ae2670`, `ae2671`, `ae2673`, `ae2674`, `ae2676`, `ae2678`, `ae268`, `ae2680`, `ae2683`, `ae2684`, `ae2685`, `ae2686`, `ae2688`, `ae2689`, `ae269`, `ae2690`, `ae2691`, `ae2692`, `ae2693`, `ae2696`, `ae2697`, `ae2698`, `ae2699`, `ae27`, `ae270`, `ae2702`, `ae2703`, `ae2708`, `ae271`, `ae2710`, `ae2715`, `ae272`, `ae2723`, `ae2726`, `ae2727`, `ae273`, `ae2736`, `ae2738`, `ae274`, `ae2740`, `ae2741`, `ae2742`, `ae2743`, `ae2745`, `ae2747`, `ae2748`, `ae2749`, `ae275`, `ae2751`, `ae2752`, `ae2753`, `ae2754`, `ae2757`, `ae2759`, `ae2760`, `ae2761`, `ae2762`, `ae2763`, `ae2764`, `ae2766`, `ae2767`, `ae2768`, `ae2770`, `ae2771`, `ae2772`, `ae2773`, `ae2774`, `ae278`, `ae2785`, `ae279`, `ae2796`, `ae2806`, `ae2807`, `ae282`, `ae2820`, `ae2829`, `ae283`, `ae2833`, `ae284`, `ae2841`, `ae2848`, `ae285`, `ae2851`, `ae2856`, `ae2858`, `ae286`, `ae2862`, `ae2864`, `ae2866`, `ae287`, `ae2871`, `ae288`, `ae2881`, `ae2884`, `ae2888`, `ae2890`, `ae2891`, `ae2895`, `ae29`, `ae2900`, `ae2903`, `ae2904`, `ae2905`, `ae2908`, `ae2909`, `ae291`, `ae2915`, `ae2918`, `ae292`, `ae2924`, `ae2925`, `ae2928`, `ae2929`, `ae2932`, `ae2934`, `ae2935`, `ae2937`, `ae2947`, `ae2950`, `ae2951`, `ae2953`, `ae2954`, `ae2955`, `ae2957`, `ae296`, `ae2961`, `ae2962`, `ae2963`, `ae2964`, `ae2965`, `ae2966`, `ae2967`, `ae2968`, `ae297`, `ae2971`, `ae2972`, `ae2973`, `ae2974`, `ae2975`, `ae2976`, `ae2978`, `ae298`, `ae2980`, `ae2981`, `ae2982`, `ae2983`, `ae2987`, `ae2990`, `ae2991`, `ae2992`, `ae2995`, `ae2996`, `ae2998`, `ae2999`, `ae3`, `ae30`, `ae300`, `ae3005`, `ae3007`, `ae3009`, `ae301`, `ae3010`, `ae3012`, `ae3014`, `ae3018`, `ae3022`, `ae303`, `ae3031`, `ae3034`, `ae3037`, `ae3042`, `ae3043`, `ae3046`, `ae3047`, `ae3049`, `ae305`, `ae3050`, `ae3052`, `ae3053`, `ae306`, `ae3064`, `ae3068`, `ae307`, `ae3078`, `ae308`, `ae3081`, `ae3088`, `ae3089`, `ae309`, `ae3097`, `ae3100`, `ae3101`, `ae3102`, `ae3103`, `ae3104`, `ae3105`, `ae3106`, `ae3107`, `ae3108`, `ae311`, `ae3110`, `ae3112`, `ae3115`, `ae3116`, `ae3117`, `ae3119`, `ae3120`, `ae3121`, `ae3122`, `ae3123`, `ae3127`, `ae3128`, `ae3132`, `ae3133`, `ae3136`, `ae3139`, `ae314`, `ae3141`, `ae3142`, `ae3143`, `ae3146`, `ae3149`, `ae315`, `ae3151`, `ae3153`, `ae3154`, `ae3155`, `ae3156`, `ae3157`, `ae3158`, `ae3162`, `ae3166`, `ae3167`, `ae3168`, `ae317`, `ae3170`, `ae3171`, `ae3174`, `ae3175`, `ae3176`, `ae3177`, `ae3178`, `ae318`, `ae3186`, `ae3188`, `ae3189`, `ae319`, `ae3190`, `ae3191`, `ae3193`, `ae3197`, `ae320`, `ae3201`, `ae3202`, `ae321`, `ae3215`, `ae3216`, `ae3217`, `ae3219`, `ae3221`, `ae3228`, `ae323`, `ae3235`, `ae324`, `ae3250`, `ae3251`, `ae3252`, `ae3254`, `ae3255`, `ae3256`, `ae3259`, `ae326`, `ae3260`, `ae3262`, `ae3263`, `ae3264`, `ae327`, `ae3280`, `ae3282`, `ae3287`, `ae3289`, `ae329`, `ae3291`, `ae3292`, `ae3294`, `ae3295`, `ae3296`, `ae3297`, `ae3298`, `ae33`, `ae330`, `ae3301`, `ae3302`, `ae3305`, `ae3306`, `ae331`, `ae3311`, `ae3312`, `ae3316`, `ae3317`, `ae3318`, `ae3319`, `ae3328`, `ae333`, `ae3330`, `ae3331`, `ae3332`, `ae3334`, `ae3335`, `ae3336`, `ae334`, `ae3342`, `ae3344`, `ae3346`, `ae335`, `ae3350`, `ae3351`, `ae3354`, `ae3356`, `ae3357`, `ae336`, `ae3360`, `ae3365`, `ae3369`, `ae337`, `ae3372`, `ae3380`, `ae3381`, `ae3383`, `ae3384`, `ae3386`, `ae3391`, `ae34`, `ae340`, `ae3401`, `ae3402`, `ae3406`, `ae3413`, `ae3429`, `ae3430`, `ae3432`, `ae3439`, `ae3447`, `ae3448`, `ae3451`, `ae3455`, `ae3456`, `ae3463`, `ae3473`, `ae3476`, `ae3477`, `ae3488`, `ae349`, `ae3499`, `ae35`, `ae350`, `ae3501`, `ae3502`, `ae3504`, `ae3505`, `ae3508`, `ae3514`, `ae3516`, `ae3518`, `ae3519`, `ae3520`, `ae3521`, `ae3524`, `ae353`, `ae3530`, `ae3532`, `ae3534`, `ae3538`, `ae354`, `ae3543`, `ae3544`, `ae3545`, `ae3558`, `ae3564`, `ae3568`, `ae3571`, `ae3573`, `ae3574`, `ae3579`, `ae3582`, `ae3594`, `ae3597`, `ae36`, `ae3606`, `ae3610`, `ae3614`, `ae3616`, `ae3617`, `ae3618`, `ae3619`, `ae362`, `ae3629`, `ae363`, `ae3638`, `ae3647`, `ae365`, `ae366`, `ae3660`, `ae3671`, `ae3673`, `ae3674`, `ae3675`, `ae3676`, `ae3678`, `ae3679`, `ae368`, `ae3680`, `ae3681`, `ae3687`, `ae3688`, `ae3689`, `ae3690`, `ae3693`, `ae3694`, `ae3697`, `ae3699`, `ae37`, `ae3701`, `ae3702`, `ae3703`, `ae3709`, `ae3717`, `ae3718`, `ae3720`, `ae3722`, `ae3724`, `ae3727`, `ae3729`, `ae3731`, `ae3733`, `ae3734`, `ae3738`, `ae374`, `ae3740`, `ae3741`, `ae3746`, `ae3747`, `ae3751`, `ae3753`, `ae3754`, `ae3759`, `ae376`, `ae3762`, `ae3766`, `ae3769`, `ae377`, `ae3771`, `ae3781`, `ae3782`, `ae3783`, `ae3785`, `ae3787`, `ae3789`, `ae379`, `ae3793`, `ae380`, `ae3805`, `ae3807`, `ae3809`, `ae3816`, `ae3817`, `ae3822`, `ae3828`, `ae3835`, `ae3836`, `ae3837`, `ae3838`, `ae3839`, `ae3840`, `ae3843`, `ae3845`, `ae3848`, `ae385`, `ae3852`, `ae3855`, `ae3856`, `ae3857`, `ae3866`, `ae3868`, `ae3869`, `ae3877`, `ae3878`, `ae388`, `ae3881`, `ae3885`, `ae3887`, `ae3889`, `ae3893`, `ae3894`, `ae3896`, `ae3899`, `ae39`, `ae390`, `ae3900`, `ae3905`, `ae3906`, `ae3907`, `ae3910`, `ae3914`, `ae3919`, `ae3922`, `ae3935`, `ae3936`, `ae394`, `ae395`, `ae3953`, `ae3956`, `ae3957`, `ae3962`, `ae397`, `ae3979`, `ae398`, `ae3982`, `ae3986`, `ae399`, `ae40`, `ae400`, `ae4003`, `ae4004`, `ae4009`, `ae401`, `ae4010`, `ae4011`, `ae4012`, `ae402`, `ae4021`, `ae4029`, `ae403`, `ae4031`, `ae4037`, `ae404`, `ae4045`, `ae4048`, `ae4049`, `ae405`, `ae4050`, `ae406`, `ae407`, `ae408`, `ae41`, `ae410`, `ae4100`, `ae4103`, `ae411`, `ae4110`, `ae4111`, `ae4113`, `ae4116`, `ae4117`, `ae4119`, `ae4125`, `ae4126`, `ae4128`, `ae4129`, `ae413`, `ae4131`, `ae4133`, `ae4137`, `ae4138`, `ae4139`, `ae414`, `ae4141`, `ae4143`, `ae4145`, `ae4146`, `ae4151`, `ae4157`, `ae416`, `ae4161`, `ae4162`, `ae4163`, `ae4164`, `ae4166`, `ae4167`, `ae4169`, `ae4171`, `ae4172`, `ae4173`, `ae4174`, `ae4175`, `ae4176`, `ae4177`, `ae4178`, `ae4179`, `ae4180`, `ae4181`, `ae4182`, `ae4183`, `ae4184`, `ae4185`, `ae4186`, `ae4187`, `ae4188`, `ae4190`, `ae4191`, `ae4193`, `ae4195`, `ae4197`, `ae4198`, `ae4199`, `ae42`, `ae420`, `ae4203`, `ae4204`, `ae4206`, `ae4209`, `ae421`, `ae4211`, `ae4212`, `ae4215`, `ae4216`, `ae4219`, `ae422`, `ae4220`, `ae4221`, `ae4229`, `ae423`, `ae4231`, `ae4235`, `ae4247`, `ae4248`, `ae4250`, `ae4253`, `ae4255`, `ae4256`, `ae4257`, `ae4259`, `ae4260`, `ae4261`, `ae4264`, `ae4265`, `ae4266`, `ae4268`, `ae4270`, `ae4271`, `ae4274`, `ae4275`, `ae4276`, `ae4278`, `ae428`, `ae4281`, `ae4282`, `ae4283`, `ae4286`, `ae4289`, `ae4293`, `ae4298`, `ae43`, `ae430`, `ae4300`, `ae4305`, `ae4306`, `ae4307`, `ae4308`, `ae4309`, `ae431`, `ae4310`, `ae4313`, `ae4314`, `ae4315`, `ae4316`, `ae4317`, `ae4318`, `ae4320`, `ae4322`, `ae4323`, `ae4324`, `ae4325`, `ae4326`, `ae4327`, `ae4328`, `ae4329`, `ae4330`, `ae4331`, `ae4332`, `ae4335`, `ae4337`, `ae4339`, `ae4340`, `ae4342`, `ae4354`, `ae4356`, `ae4357`, `ae4359`, `ae4360`, `ae4365`, `ae4366`, `ae4367`, `ae4368`, `ae4369`, `ae4371`, `ae4373`, `ae4374`, `ae4375`, `ae4376`, `ae4377`, `ae4378`, `ae4379`, `ae4380`, `ae4382`, `ae4383`, `ae4384`, `ae4385`, `ae4386`, `ae4387`, `ae4388`, `ae4389`, `ae439`, `ae4390`, `ae4395`, `ae4396`, `ae4398`, `ae44`, `ae4404`, `ae4408`, `ae441`, `ae4410`, `ae4412`, `ae4414`, `ae4415`, `ae4416`, `ae4417`, `ae4418`, `ae4420`, `ae4421`, `ae4422`, `ae4423`, `ae4424`, `ae4425`, `ae4426`, `ae4427`, `ae4429`, `ae443`, `ae4430`, `ae4431`, `ae4432`, `ae4433`, `ae4434`, `ae4435`, `ae4436`, `ae4437`, `ae4438`, `ae4439`, `ae4440`, `ae4441`, `ae4442`, `ae4443`, `ae4444`, `ae4445`, `ae4446`, `ae4447`, `ae4448`, `ae4449`, `ae445`, `ae4450`, `ae4451`, `ae4452`, `ae4453`, `ae4454`, `ae4455`, `ae4456`, `ae4457`, `ae4458`, `ae4459`, `ae4463`, `ae4469`, `ae4470`, `ae4471`, `ae4472`, `ae4473`, `ae4474`, `ae4475`, `ae4476`, `ae4477`, `ae4478`, `ae4479`, `ae448`, `ae4480`, `ae4481`, `ae4482`, `ae4483`, `ae4484`, `ae4485`, `ae4486`, `ae4487`, `ae4488`, `ae4489`, `ae4490`, `ae4491`, `ae4493`, `ae4495`, `ae4496`, `ae4497`, `ae4498`, `ae45`, `ae450`, `ae4500`, `ae4501`, `ae4502`, `ae451`, `ae4512`, `ae4513`, `ae4514`, `ae4515`, `ae4517`, `ae4518`, `ae4519`, `ae4520`, `ae4521`, `ae4523`, `ae4525`, `ae4526`, `ae4527`, `ae4528`, `ae4529`, `ae4530`, `ae4532`, `ae4534`, `ae4535`, `ae4536`, `ae4539`, `ae4540`, `ae4541`, `ae4542`, `ae4543`, `ae4544`, `ae4545`, `ae4546`, `ae4547`, `ae4548`, `ae4549`, `ae4550`, `ae4551`, `ae4552`, `ae4553`, `ae4554`, `ae4555`, `ae4556`, `ae4558`, `ae4559`, `ae456`, `ae4562`, `ae4563`, `ae4567`, `ae4569`, `ae4571`, `ae4572`, `ae4573`, `ae4575`, `ae4576`, `ae4577`, `ae4578`, `ae4579`, `ae458`, `ae4580`, `ae4581`, `ae4582`, `ae4583`, `ae4584`, `ae4585`, `ae4588`, `ae4589`, `ae459`, `ae4590`, `ae4591`, `ae4592`, `ae4593`, `ae4594`, `ae4595`, `ae4596`, `ae4597`, `ae4598`, `ae4599`, `ae460`, `ae4600`, `ae4601`, `ae4602`, `ae4603`, `ae4604`, `ae4605`, `ae4606`, `ae4607`, `ae4609`, `ae4610`, `ae462`, `ae4621`, `ae4622`, `ae463`, `ae4631`, `ae4632`, `ae4634`, `ae4635`, `ae4637`, `ae4638`, `ae4639`, `ae4640`, `ae4641`, `ae4642`, `ae4643`, `ae4644`, `ae4645`, `ae4646`, `ae4647`, `ae4648`, `ae4649`, `ae465`, `ae4650`, `ae4651`, `ae4652`, `ae4653`, `ae4654`, `ae4655`, `ae4656`, `ae4658`, `ae4659`, `ae466`, `ae4662`, `ae4664`, `ae4665`, `ae4666`, `ae4669`, `ae467`, `ae4670`, `ae4671`, `ae4672`, `ae4674`, `ae4675`, `ae4676`, `ae4677`, `ae4679`, `ae468`, `ae4681`, `ae4682`, `ae4683`, `ae4684`, `ae4685`, `ae4686`, `ae4688`, `ae4689`, `ae469`, `ae4690`, `ae4692`, `ae4693`, `ae4696`, `ae4697`, `ae4699`, `ae4702`, `ae4703`, `ae4704`, `ae4706`, `ae4707`, `ae4709`, `ae4710`, `ae4711`, `ae4712`, `ae4713`, `ae4714`, `ae4715`, `ae4717`, `ae4718`, `ae4720`, `ae4721`, `ae4722`, `ae4723`, `ae4725`, `ae4727`, `ae4728`, `ae4729`, `ae473`, `ae4730`, `ae4732`, `ae4733`, `ae4735`, `ae4736`, `ae4738`, `ae4739`, `ae4740`, `ae4741`, `ae4742`, `ae4744`, `ae4745`, `ae4746`, `ae4747`, `ae4749`, `ae475`, `ae4750`, `ae4756`, `ae4759`, `ae476`, `ae4762`, `ae4763`, `ae4764`, `ae4765`, `ae4767`, `ae4768`, `ae4769`, `ae477`, `ae4770`, `ae4771`, `ae4772`, `ae4773`, `ae4775`, `ae4776`, `ae4777`, `ae4779`, `ae478`, `ae4780`, `ae4781`, `ae4785`, `ae4788`, `ae4789`, `ae479`, `ae4793`, `ae4795`, `ae4796`, `ae4798`, `ae48`, `ae480`, `ae4800`, `ae4801`, `ae4802`, `ae4803`, `ae4804`, `ae4805`, `ae4806`, `ae481`, `ae482`, `ae483`, `ae484`, `ae485`, `ae486`, `ae487`, `ae489`, `ae49`, `ae490`, `ae491`, `ae492`, `ae493`, `ae494`, `ae495`, `ae496`, `ae498`, `ae499`, `ae5`, `ae50`, `ae510`, `ae511`, `ae514`, `ae518`, `ae519`, `ae52`, `ae520`, `ae522`, `ae525`, `ae526`, `ae528`, `ae53`, `ae530`, `ae531`, `ae532`, `ae538`, `ae540`, `ae542`, `ae544`, `ae546`, `ae549`, `ae551`, `ae552`, `ae555`, `ae557`, `ae56`, `ae561`, `ae562`, `ae563`, `ae564`, `ae565`, `ae567`, `ae568`, `ae569`, `ae572`, `ae573`, `ae574`, `ae58`, `ae583`, `ae59`, `ae591`, `ae595`, `ae599`, `ae6`, `ae60`, `ae601`, `ae602`, `ae606`, `ae607`, `ae608`, `ae609`, `ae613`, `ae615`, `ae616`, `ae62`, `ae622`, `ae624`, `ae627`, `ae628`, `ae63`, `ae631`, `ae635`, `ae637`, `ae638`, `ae64`, `ae640`, `ae641`, `ae643`, `ae644`, `ae646`, `ae647`, `ae648`, `ae65`, `ae655`, `ae659`, `ae66`, `ae660`, `ae661`, `ae662`, `ae663`, `ae664`, `ae665`, `ae667`, `ae669`, `ae67`, `ae670`, `ae671`, `ae676`, `ae678`, `ae679`, `ae681`, `ae683`, `ae684`, `ae685`, `ae687`, `ae688`, `ae69`, `ae690`, `ae692`, `ae693`, `ae696`, `ae697`, `ae698`, `ae699`, `ae7`, `ae706`, `ae707`, `ae708`, `ae709`, `ae71`, `ae72`, `ae723`, `ae724`, `ae725`, `ae729`, `ae730`, `ae737`, `ae738`, `ae74`, `ae747`, `ae748`, `ae75`, `ae751`, `ae752`, `ae753`, `ae754`, `ae756`, `ae757`, `ae758`, `ae76`, `ae761`, `ae763`, `ae764`, `ae767`, `ae768`, `ae769`, `ae77`, `ae770`, `ae778`, `ae781`, `ae783`, `ae79`, `ae791`, `ae792`, `ae795`, `ae796`, `ae797`, `ae8`, `ae808`, `ae81`, `ae813`, `ae815`, `ae82`, `ae821`, `ae822`, `ae826`, `ae83`, `ae830`, `ae832`, `ae837`, `ae838`, `ae839`, `ae84`, `ae843`, `ae844`, `ae85`, `ae850`, `ae852`, `ae853`, `ae855`, `ae857`, `ae858`, `ae86`, `ae860`, `ae861`, `ae865`, `ae866`, `ae867`, `ae869`, `ae87`, `ae872`, `ae873`, `ae875`, `ae877`, `ae878`, `ae88`, `ae883`, `ae884`, `ae885`, `ae887`, `ae888`, `ae89`, `ae890`, `ae893`, `ae894`, `ae897`, `ae9`, `ae90`, `ae904`, `ae905`, `ae906`, `ae907`, `ae909`, `ae91`, `ae911`, `ae913`, `ae915`, `ae916`, `ae917`, `ae918`, `ae92`, `ae921`, `ae922`, `ae923`, `ae926`, `ae928`, `ae929`, `ae93`, `ae931`, `ae933`, `ae936`, `ae938`, `ae939`, `ae940`, `ae942`, `ae943`, `ae946`, `ae947`, `ae948`, `ae956`, `ae958`, `ae96`, `ae960`, `ae968`, `ae97`, `ae972`, `ae973`, `ae975`, `ae981`, `ae985`, `ae987`, `ae988`, `ae989`, `ae99`, `ae990`, `ae994`, `ae995`, `ae996`, `ae998`, `ae999`
[1] 21843 480
aernas1_study_samples_bulk <- colnames(aernas1_counts_raw_qc_umicorr_annotFilt[, -(1:9)])
length(aernas1_study_samples_bulk)[1] 622
[1] 2595
# 2595
aernas1_setdif_samples_AERNAS1vsAEDBCEA <- setdiff(aernas1_study_samples_bulk, study_samples_AEDBCEA)
length(aernas1_setdif_samples_AERNAS1vsAEDBCEA) # 0[1] 0
aernas1_setdif_samples_AEDBCEAvsAERNAS1 <- setdiff(study_samples_AEDBCEA, aernas1_study_samples_bulk)
length(aernas1_setdif_samples_AEDBCEAvsAERNAS1) # 1973[1] 1973
AEDB_AERNAS1_filt <- AEDB.CEA[AEDB.CEA$STUDY_NUMBER %in% aernas1_study_samples_bulk,]
table(AEDB_AERNAS1_filt$Artery_summary, AEDB_AERNAS1_filt$Gender)
female male
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0
carotid (left & right) 154 466
femoral/iliac (left, right or both sides) 0 0
other carotid arteries (common, external) 0 2
carotid bypass and injury (left, right or both sides) 0 0
aneurysmata (carotid & femoral) 0 0
aorta 0 0
other arteries (renal, popliteal, vertebral) 0 0
femoral bypass, angioseal and injury (left, right or both sides) 0 0
aernas2_study_samples_bulk <- colnames(aernas2_counts_raw_qc_umicorr_annotFilt[, -(1:9)])
length(aernas2_study_samples_bulk)[1] 471
[1] 2595
# 2595
aernas2_setdif_samples_AERNAS2vsAEDBCEA <- setdiff(aernas2_study_samples_bulk, study_samples_AEDBCEA)
length(aernas2_setdif_samples_AERNAS2vsAEDBCEA) # 0[1] 0
aernas2_setdif_samples_AEDBCEAvsAERNAS2 <- setdiff(study_samples_AEDBCEA, aernas2_study_samples_bulk)
length(aernas2_setdif_samples_AEDBCEAvsAERNAS2) # 2124[1] 2124
AEDB_AERNAS2_filt <- AEDB.CEA[AEDB.CEA$STUDY_NUMBER %in% aernas2_study_samples_bulk,]
table(AEDB_AERNAS2_filt$Artery_summary, AEDB_AERNAS2_filt$Gender)
female male
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0
carotid (left & right) 155 314
femoral/iliac (left, right or both sides) 0 0
other carotid arteries (common, external) 0 2
carotid bypass and injury (left, right or both sides) 0 0
aneurysmata (carotid & femoral) 0 0
aorta 0 0
other arteries (renal, popliteal, vertebral) 0 0
femoral bypass, angioseal and injury (left, right or both sides) 0 0
# Cut up aernas1_counts_raw_qc_umicorr_annotFilt into 'assay' and 'ranges' part
aernas1_counts <- as.data.frame(aernas1_counts_raw_qc_umicorr_annotFilt[,-(1:9)]) ## assay part
# aernas1_counts <- aernas1_counts %>% mutate_if(is.numeric, as.integer)
rownames(aernas1_counts) <- aernas1_counts_raw_qc_umicorr_annotFilt$ENSEMBL_gene_ID ## assign rownames
id <- aernas1_counts_raw_qc_umicorr_annotFilt$ENSEMBL_gene_ID
id[ id %in% id[duplicated(id)] ]character(0)
aernas1_bulkRNA_rowRanges <- GRanges(aernas1_counts_raw_qc_umicorr_annotFilt$chr, ## construct a GRanges object containing 4 columns (seqnames, ranges, strand, seqinfo) plus a metadata colum (feature_id): this will be the 'rowRanges' bit
IRanges(aernas1_counts_raw_qc_umicorr_annotFilt$start, aernas1_counts_raw_qc_umicorr_annotFilt$end),
strand = aernas1_counts_raw_qc_umicorr_annotFilt$strand,
feature_id = aernas1_counts_raw_qc_umicorr_annotFilt$ENSEMBL_gene_ID) #, df$pid)
names(aernas1_bulkRNA_rowRanges) <- aernas1_bulkRNA_rowRanges$feature_id
# ?org.Hs.eg.db
# ?AnnotationDb
aernas1_bulkRNA_rowRanges$symbol <- mapIds(org.Hs.eg.db,
keys = aernas1_bulkRNA_rowRanges$feature_id,
column = "SYMBOL",
keytype = "ENSEMBL",
multiVals = "first")'select()' returned 1:many mapping between keys and columns
# Reference: https://shiring.github.io/genome/2016/10/23/AnnotationDbi
# gene dataframe for EnsDb.Hsapiens.v86 # https://github.com/stuart-lab/signac/issues/79
aernas1_gene_dataframe_EnsDb <- ensembldb::select(EnsDb.Hsapiens.v86, keys = aernas1_bulkRNA_rowRanges$feature_id,
columns = c("ENTREZID", "SYMBOL", "GENEBIOTYPE"), keytype = "GENEID")
colnames(aernas1_gene_dataframe_EnsDb) <- c("Ensembl", "Entrez", "HGNC", "GENEBIOTYPE")
colnames(aernas1_gene_dataframe_EnsDb) <- paste(colnames(aernas1_gene_dataframe_EnsDb), "GRCh38p13_EnsDb86", sep = "_")
head(aernas1_gene_dataframe_EnsDb)
aernas1_bulkRNA_rowRanges$GENEBIOTYPE_EnsDb86 <- aernas1_gene_dataframe_EnsDb$GENEBIOTYPE_EnsDb86[match(aernas1_bulkRNA_rowRanges$feature_id, aernas1_gene_dataframe_EnsDb$Ensembl_EnsDb86)]
aernas1_bulkRNA_rowRangesGRanges object with 21835 ranges and 2 metadata columns:
seqnames ranges strand | feature_id symbol
<Rle> <IRanges> <Rle> | <character> <character>
ENSG00000000005 X 100584936-100599885 + | ENSG00000000005 TNMD
ENSG00000000419 20 50934867-50959140 - | ENSG00000000419 DPM1
ENSG00000000457 1 169849631-169894267 - | ENSG00000000457 SCYL3
ENSG00000000460 1 169662007-169854080 + | ENSG00000000460 FIRRM
ENSG00000000938 1 27612064-27635185 - | ENSG00000000938 FGR
... ... ... ... . ... ...
ENSG00000290203 15 68930504-69062743 + | ENSG00000290203 NOX5
ENSG00000290292 14 23272422-23299796 - | ENSG00000290292 HOMEZ
ENSG00000290320 17 32895433-32906586 + | ENSG00000290320 H2BN1
ENSG00000291237 6 159669069-159762529 - | ENSG00000291237 SOD2
ENSG00000274714 CHR_HSCHR19KIR_FH06_.. 54819131-54834528 + | ENSG00000274714 KIR2DS4
-------
seqinfo: 331 sequences from an unspecified genome; no seqlengths
# merging the two dataframes by HGNC
# aernas1_bulkRNA_rowRangesHg19Ensemblb86 <- GRanges(merge(aernas1_bulkRNA_rowRanges, aernas1_gene_dataframe_EnsDb, by.x = "feature_id", by.y = "Ensembl_EnsDb86", sort = FALSE, all.x = TRUE))
# names(aernas1_bulkRNA_rowRangesHg19Ensemblb86) <- aernas1_bulkRNA_rowRangesHg19Ensemblb86$feature_id
# aernas1_bulkRNA_rowRangesHg19Ensemblb86
# temp <- as.data.frame(table(aernas1_bulkRNA_rowRanges$GENEBIOTYPE_EnsDb86))
# colnames(temp) <- c("GeneBiotype", "Count")
#
# ggpubr::ggbarplot(temp, x = "GeneBiotype", y = "Count",
# color = "GeneBiotype", fill = "GeneBiotype",
# xlab = "gene type") +
# theme(axis.text.x = element_text(angle = 45))
# rm(temp)# Cut up aernas2_counts_raw_qc_umicorr_annotFilt into 'assay' and 'ranges' part
aernas2_counts <- as.data.frame(aernas2_counts_raw_qc_umicorr_annotFilt[,-(1:9)]) ## assay part
# aernas2_counts <- aernas2_counts %>% mutate_if(is.numeric, as.integer)
rownames(aernas2_counts) <- aernas2_counts_raw_qc_umicorr_annotFilt$ENSEMBL_gene_ID ## assign rownames
id <- aernas2_counts_raw_qc_umicorr_annotFilt$ENSEMBL_gene_ID
id[ id %in% id[duplicated(id)] ]character(0)
aernas2_bulkRNA_rowRanges <- GRanges(aernas2_counts_raw_qc_umicorr_annotFilt$chr, ## construct a GRanges object containing 4 columns (seqnames, ranges, strand, seqinfo) plus a metadata colum (feature_id): this will be the 'rowRanges' bit
IRanges(aernas2_counts_raw_qc_umicorr_annotFilt$start, aernas2_counts_raw_qc_umicorr_annotFilt$end),
strand = aernas2_counts_raw_qc_umicorr_annotFilt$strand,
feature_id = aernas2_counts_raw_qc_umicorr_annotFilt$ENSEMBL_gene_ID) #, df$pid)
names(aernas2_bulkRNA_rowRanges) <- aernas2_bulkRNA_rowRanges$feature_id
# ?org.Hs.eg.db
# ?AnnotationDb
aernas2_bulkRNA_rowRanges$symbol <- mapIds(org.Hs.eg.db,
keys = aernas2_bulkRNA_rowRanges$feature_id,
column = "SYMBOL",
keytype = "ENSEMBL",
multiVals = "first")'select()' returned 1:many mapping between keys and columns
# Reference: https://shiring.github.io/genome/2016/10/23/AnnotationDbi
# gene dataframe for EnsDb.Hsapiens.v86 # https://github.com/stuart-lab/signac/issues/79
aernas2_gene_dataframe_EnsDb <- ensembldb::select(EnsDb.Hsapiens.v86, keys = aernas2_bulkRNA_rowRanges$feature_id,
columns = c("ENTREZID", "SYMBOL", "GENEBIOTYPE"), keytype = "GENEID")
colnames(aernas2_gene_dataframe_EnsDb) <- c("Ensembl", "Entrez", "HGNC", "GENEBIOTYPE")
colnames(aernas2_gene_dataframe_EnsDb) <- paste(colnames(aernas2_gene_dataframe_EnsDb), "GRCh38p13_EnsDb86", sep = "_")
head(aernas2_gene_dataframe_EnsDb)
aernas2_bulkRNA_rowRanges$GENEBIOTYPE_EnsDb86 <- aernas2_gene_dataframe_EnsDb$GENEBIOTYPE_EnsDb86[match(aernas2_bulkRNA_rowRanges$feature_id, aernas2_gene_dataframe_EnsDb$Ensembl_EnsDb86)]
aernas2_bulkRNA_rowRangesGRanges object with 21843 ranges and 2 metadata columns:
seqnames ranges strand | feature_id symbol
<Rle> <IRanges> <Rle> | <character> <character>
ENSG00000000005 X 100584936-100599885 + | ENSG00000000005 TNMD
ENSG00000000419 20 50934867-50959140 - | ENSG00000000419 DPM1
ENSG00000000457 1 169849631-169894267 - | ENSG00000000457 SCYL3
ENSG00000000460 1 169662007-169854080 + | ENSG00000000460 FIRRM
ENSG00000000938 1 27612064-27635185 - | ENSG00000000938 FGR
... ... ... ... . ... ...
ENSG00000290203 15 68930504-69062743 + | ENSG00000290203 NOX5
ENSG00000290292 14 23272422-23299796 - | ENSG00000290292 HOMEZ
ENSG00000290320 17 32895433-32906586 + | ENSG00000290320 H2BN1
ENSG00000291237 6 159669069-159762529 - | ENSG00000291237 SOD2
ENSG00000281861 CHR_HSCHR5_5_CTG1 524112-524332 - | ENSG00000281861 SLC9A3
-------
seqinfo: 331 sequences from an unspecified genome; no seqlengths
# merging the two dataframes by HGNC
# aernas2_bulkRNA_rowRangesHg19Ensemblb86 <- GRanges(merge(aernas2_bulkRNA_rowRanges, aernas2_gene_dataframe_EnsDb, by.x = "feature_id", by.y = "Ensembl_EnsDb86", sort = FALSE, all.x = TRUE))
# names(aernas2_bulkRNA_rowRangesHg19Ensemblb86) <- aernas2_bulkRNA_rowRangesHg19Ensemblb86$feature_id
# aernas2_bulkRNA_rowRangesHg19Ensemblb86
# temp <- as.data.frame(table(aernas2_bulkRNA_rowRanges$GENEBIOTYPE_EnsDb86))
# colnames(temp) <- c("GeneBiotype", "Count")
#
# ggpubr::ggbarplot(temp, x = "GeneBiotype", y = "Count",
# color = "GeneBiotype", fill = "GeneBiotype",
# xlab = "gene type") +
# theme(axis.text.x = element_text(angle = 45))
# rm(temp)# match up with meta data of RNAseq experiment
aernas1_meta_filt <- aernas1_meta %>%
dplyr::filter(study_number %in% AEDB.CEA.sampleList) # select gene expression of only patients in RNA-seq AE df, sort in same order as metadata study_number
# combine meta data from experiment with clinical data
aernas1_meta_clin <- merge(aernas1_meta_filt, AEDB.CEA, by.x = "study_number", by.y = "STUDY_NUMBER",
sort = FALSE, all.x = TRUE)
aernas1_meta_clin %<>%
# mutate(macrophages = factor(macrophages, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(smc = factor(smc, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(calcification = factor(calcification, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(collagen = factor(collagen, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(fat = factor(fat, levels = c("no fat", "< 40% fat", "> 40% fat"))) %>%
mutate(study_number_row = study_number) %>%
as.data.frame() %>%
column_to_rownames("study_number_row")mutate: new variable 'study_number_row' (character) with 665 unique values and 0% NA
[1] 665 1215
We don’t have meta-data yet.
# match up with meta data of RNAseq experiment
# aernas2_meta_filt <- aernas2_meta %>%
# dplyr::filter(study_number %in% AEDB.CEA.sampleList) # select gene expression of only patients in RNA-seq AE df, sort in same order as metadata study_number
# combine meta data from experiment with clinical data
# aernas2_meta_clin <- merge(aernas2_meta_filt, AEDB.CEA, by.x = "study_number", by.y = "STUDY_NUMBER",
# sort = FALSE, all.x = TRUE)
aernas2_meta_clin = AEDB.CEA
aernas2_meta_clin %<>%
# mutate(macrophages = factor(macrophages, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(smc = factor(smc, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(calcification = factor(calcification, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(collagen = factor(collagen, levels = c("no staining", "minor staining", "moderate staining", "heavy staining"))) %>%
# mutate(fat = factor(fat, levels = c("no fat", "< 40% fat", "> 40% fat"))) %>%
mutate(study_number_row = STUDY_NUMBER) %>%
as.data.frame() %>%
column_to_rownames("study_number_row")mutate: new variable 'study_number_row' (character) with 2,595 unique values and 0% NA
[1] 2595 1212
We make a SummarizedExperiment for the RNAseq data. We
make sure to only include the samples we need based on informed consent
and we include only the requested variables.
First, we define the variables we need.
# Baseline table variables
basetable_vars = c("Hospital", "ORyear", "Artery_summary",
"Age", "Gender",
# "TC_finalCU", "LDL_finalCU", "HDL_finalCU", "TG_finalCU",
"TC_final", "LDL_final", "HDL_final", "TG_final",
# "hsCRP_plasma",
"systolic", "diastoli", "GFR_MDRD", "BMI",
"KDOQI", "BMI_WHO",
"SmokerStatus", "AlcoholUse",
"DiabetesStatus",
"Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs",
"Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
"Stroke_Dx", "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
"Symptoms.Update2G", "Symptoms.Update3G",
"restenos", "stenose",
"CAD_history", "PAOD", "Peripheral.interv",
"EP_composite", "EP_composite_time", "epcom.3years",
"EP_major", "EP_major_time","epmajor.3years",
"MAC_rankNorm", "SMC_rankNorm", "Macrophages.bin", "SMC.bin",
"Neutrophils_rankNorm", "MastCells_rankNorm",
"IPH.bin", "VesselDensity_rankNorm",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index",
"PCSK9_plasma", "PCSK9_plasma_rankNorm") # this is for a sanity check
basetable_bin = c("Gender", "Artery_summary",
"KDOQI", "BMI_WHO",
"SmokerStatus", "AlcoholUse",
"DiabetesStatus",
"Hypertension.selfreport", "Hypertension.selfreportdrug", "Hypertension.composite", "Hypertension.drugs",
"Med.anticoagulants", "Med.all.antiplatelet", "Med.Statin.LLD",
"Stroke_Dx", "sympt", "Symptoms.5G", "AsymptSympt", "AsymptSympt2G",
"Symptoms.Update2G", "Symptoms.Update3G",
"restenos", "stenose",
"CAD_history", "PAOD", "Peripheral.interv",
"EP_composite", "Macrophages.bin", "SMC.bin",
"IPH.bin",
"Calc.bin", "Collagen.bin",
"Fat.bin_10", "Fat.bin_40",
"OverallPlaquePhenotype", "Plaque_Vulnerability_Index")
# basetable_bin
basetable_con = basetable_vars[!basetable_vars %in% basetable_bin]
# basetable_conNext, we are constructing the SummarizedExperiment.
* loading data ...
# this is all the data passing RNAseq quality control and UMI-corrected
# - includes 631 patients
# - after filtering on informed consent and artery type, the end sample size should be 622
# - after filtering on 'no commercial business' based on informed consent, there are fewer samples: 608
dim(aernas1_counts_raw_qc_umicorr_annotFilt)[1] 21835 631
[1] 21835 622
* making a SummarizedExperiment ...
> getting counts
> meta data
temp_coldat <- data.frame(STUDY_NUMBER = names(aernas1_counts_raw_qc_umicorr_annotFilt[,10:631]),
SampleType = "plaque", RNAseqTech = "CEL2-seq", RNAseqType = "3' RNAseq", RNAseqQC = "UMI-corrected",
StudyType = "CEA", StudyName = "AERNAS1", StudyBiobank = "Athero-Express Biobank Study (AE)", SampleSize = "622",
InformedConsent = "ACADEMIC",
row.names = names(aernas1_counts_raw_qc_umicorr_annotFilt[,10:631]))
cat(" > clinical data\n") > clinical data
# bulkRNA_meta_clin_COMMERCIAL <- subset(bulkRNA_meta_clin, select = c("study_number", basetable_vars))
aernas1_meta_clin_ACADEMIC <- subset(aernas1_meta_clin, select = c("study_number", basetable_vars))
# temp_coldat_clin <- merge(temp_coldat, bulkRNA_meta_clin_COMMERCIAL, by.x = "STUDY_NUMBER", by.y = "study_number", sort = FALSE, all.x = TRUE)
temp_coldat_clin <- merge(temp_coldat, aernas1_meta_clin_ACADEMIC, by.x = "STUDY_NUMBER", by.y = "study_number", sort = FALSE, all.x = TRUE)
rownames(temp_coldat_clin) <- temp_coldat_clin$STUDY_NUMBER
dim(temp_coldat_clin)[1] 622 69
> construction of the SE
(AERNAS1SE <- SummarizedExperiment(assays = list(counts = as.matrix(aernas1_counts)),
colData = temp_coldat_clin,
rowRanges = aernas1_bulkRNA_rowRanges,
metadata = "Athero-Express RNAseq Study 1: bulk RNA sequencing in carotid plaques. Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected"))class: RangedSummarizedExperiment
dim: 21835 622
metadata(1): ''
assays(1): counts
rownames(21835): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000274714
rowData names(2): feature_id symbol
colnames(622): ae1 ae1026 ... ae998 ae999
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
* removing intermediate files ...
* loading data ...
# this is all the data passing RNAseq quality control and UMI-corrected
# - includes 481 patients
# - after filtering on informed consent and artery type, the end sample size should be 471
# - after filtering on 'no commercial business' based on informed consent, there are fewer samples: [not done]
dim(aernas2_counts_raw_qc_umicorr_annotFilt)[1] 21843 480
[1] 21843 471
* making a SummarizedExperiment ...
> getting counts
> meta data
temp_coldat <- data.frame(STUDY_NUMBER = names(aernas2_counts_raw_qc_umicorr_annotFilt[,10:480]),
SampleType = "plaque", RNAseqTech = "CEL2-seq", RNAseqType = "3' RNAseq", RNAseqQC = "UMI-corrected",
StudyType = "CEA", StudyName = "AERNAS2", StudyBiobank = "Athero-Express Biobank Study (AE)", SampleSize = "622",
InformedConsent = "ACADEMIC",
row.names = names(aernas2_counts_raw_qc_umicorr_annotFilt[,10:480]))
cat(" > clinical data\n") > clinical data
# bulkRNA_meta_clin_COMMERCIAL <- subset(bulkRNA_meta_clin, select = c("study_number", basetable_vars))
aernas2_meta_clin_ACADEMIC <- subset(aernas2_meta_clin, select = c("STUDY_NUMBER", basetable_vars))
# temp_coldat_clin <- merge(temp_coldat, bulkRNA_meta_clin_COMMERCIAL, by.x = "STUDY_NUMBER", by.y = "study_number", sort = FALSE, all.x = TRUE)
temp_coldat_clin <- merge(temp_coldat, aernas2_meta_clin_ACADEMIC, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", sort = FALSE, all.x = TRUE)
rownames(temp_coldat_clin) <- temp_coldat_clin$STUDY_NUMBER
dim(temp_coldat_clin)[1] 471 69
> construction of the SE
(AERNAS2SE <- SummarizedExperiment(assays = list(counts = as.matrix(aernas2_counts)),
colData = temp_coldat_clin,
rowRanges = aernas2_bulkRNA_rowRanges,
metadata = "Athero-Express RNAseq Study 2: bulk RNA sequencing in carotid plaques. Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected"))class: RangedSummarizedExperiment
dim: 21843 471
metadata(1): ''
assays(1): counts
rownames(21843): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000281861
rowData names(2): feature_id symbol
colnames(471): ae105 ae1078 ... ae986 ae992
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
* removing intermediate files ...
Here we create two datasets, but make sure, we retain information on which is which.
* loading data ...
temp1_coldat <- data.frame(STUDY_NUMBER = names(aernas1_counts_raw_qc_umicorr_annotFilt[,10:631]),
SampleType = "plaque", RNAseqTech = "CEL2-seq", RNAseqType = "3' RNAseq", RNAseqQC = "UMI-corrected",
StudyType = "CEA", StudyName = "AERNAS1", StudyBiobank = "Athero-Express Biobank Study (AE)", SampleSize = "622",
InformedConsent = "ACADEMIC",
row.names = names(aernas1_counts_raw_qc_umicorr_annotFilt[,10:631]))
temp2_coldat <- data.frame(STUDY_NUMBER = names(aernas2_counts_raw_qc_umicorr_annotFilt[,10:480]),
SampleType = "plaque", RNAseqTech = "CEL2-seq", RNAseqType = "3' RNAseq", RNAseqQC = "UMI-corrected",
StudyType = "CEA", StudyName = "AERNAS2", StudyBiobank = "Athero-Express Biobank Study (AE)", SampleSize = "471",
InformedConsent = "ACADEMIC",
row.names = names(aernas2_counts_raw_qc_umicorr_annotFilt[,10:480]))
cat("* checking whether each list of samples is unique ...\n")* checking whether each list of samples is unique ...
setdif_samples_AERNAS1vsAERNAS2 <- setdiff(temp1_coldat$STUDY_NUMBER, temp2_coldat$STUDY_NUMBER)
setdif_samples_AERNAS2vsAERNAS1 <- setdiff(temp2_coldat$STUDY_NUMBER, temp1_coldat$STUDY_NUMBER)
length(setdif_samples_AERNAS1vsAERNAS2) # 622[1] 622
[1] 471
[1] 1093 10
> clinical data
combined_meta_clin_ACADEMIC <- subset(aernas2_meta_clin, select = c("STUDY_NUMBER", basetable_vars))
dim(combined_meta_clin_ACADEMIC)[1] 2595 60
temp_coldat_clin <- merge(temp_coldat_merge, combined_meta_clin_ACADEMIC, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", sort = FALSE, all.x = TRUE)
rownames(temp_coldat_clin) <- temp_coldat_clin$STUDY_NUMBER
dim(temp_coldat_clin)[1] 1093 69
aernas1_counts$ENSEMBL_gene_ID <- row.names(aernas1_counts)
aernas2_counts$ENSEMBL_gene_ID <- row.names(aernas2_counts)
combined_counts <- merge(aernas1_counts, aernas2_counts, by.x = "ENSEMBL_gene_ID", by.y = "ENSEMBL_gene_ID", sort = FALSE, all.x = TRUE)
dim(combined_counts)[1] 21835 1094
For annotations we use the annotables from Stephen
Turner.
Checking existence of duplicate ENSEMBL IDs - there shouldn't be any.
character(0)
Annotating combined data with b38.
[1] 21835 1094
combined_counts_annot <- combined_counts %>%
# arrange(p.adjusted) %>%
# head(20) %>%
inner_join(grch38, by=c("ENSEMBL_gene_ID"="ensgene")) %>%
# select(gene, estimate, p.adjusted, symbol, description) %>%
relocate(entrez, symbol, chr, start, end, strand, biotype, description,
.before = ae1) %>% # put everything before sample ae1
dplyr::filter(duplicated(ENSEMBL_gene_ID) == FALSE)inner_join: added 8 columns (entrez, symbol, chr, start, end, …) > rows only in x ( 0) > rows only in grch38 (52,540) > matched rows 22,578 (includes duplicates) > ======== > rows total 22,578relocate: columns reordered (ENSEMBL_gene_ID, entrez, symbol, chr, start, …)
character(0)
Creating GRanges combined data with b38.
rownames(combined_counts) <- combined_counts$ENSEMBL_gene_ID ## assign rownames
combined_counts$ENSEMBL_gene_ID <- NULL
id <- combined_counts$ENSEMBL_gene_ID
id[ id %in% id[duplicated(id)] ]NULL
combined_counts_rowRanges <- GRanges(combined_counts_annot$chr, ## construct a GRanges object containing 4 columns (seqnames, ranges, strand, seqinfo) plus a metadata colum (feature_id): this will be the 'rowRanges' bit
IRanges(combined_counts_annot$start, combined_counts_annot$end),
strand = combined_counts_annot$strand,
feature_id = combined_counts_annot$ENSEMBL_gene_ID) #, df$pid)
names(combined_counts_rowRanges) <- combined_counts_rowRanges$feature_id
# ?org.Hs.eg.db
# ?AnnotationDb
combined_counts_rowRanges$symbol <- mapIds(org.Hs.eg.db,
keys = combined_counts_rowRanges$feature_id,
column = "SYMBOL",
keytype = "ENSEMBL",
multiVals = "first")'select()' returned 1:many mapping between keys and columns
# Reference: https://shiring.github.io/genome/2016/10/23/AnnotationDbi
# gene dataframe for EnsDb.Hsapiens.v86 # https://github.com/stuart-lab/signac/issues/79
combined_counts_EnsDb <- ensembldb::select(EnsDb.Hsapiens.v86, keys = combined_counts_rowRanges$feature_id,
columns = c("ENTREZID", "SYMBOL", "GENEBIOTYPE"), keytype = "GENEID")
colnames(combined_counts_EnsDb) <- c("Ensembl", "Entrez", "HGNC", "GENEBIOTYPE")
colnames(combined_counts_EnsDb) <- paste(colnames(combined_counts_EnsDb), "GRCh38p13_EnsDb86", sep = "_")
head(combined_counts_EnsDb)
combined_counts_rowRanges$GENEBIOTYPE_EnsDb86 <- combined_counts_EnsDb$GENEBIOTYPE_EnsDb86[match(combined_counts_rowRanges$feature_id, combined_counts_EnsDb$Ensembl_EnsDb86)]
combined_counts_rowRangesGRanges object with 21835 ranges and 2 metadata columns:
seqnames ranges strand | feature_id symbol
<Rle> <IRanges> <Rle> | <character> <character>
ENSG00000000005 X 100584936-100599885 + | ENSG00000000005 TNMD
ENSG00000000419 20 50934867-50959140 - | ENSG00000000419 DPM1
ENSG00000000457 1 169849631-169894267 - | ENSG00000000457 SCYL3
ENSG00000000460 1 169662007-169854080 + | ENSG00000000460 FIRRM
ENSG00000000938 1 27612064-27635185 - | ENSG00000000938 FGR
... ... ... ... . ... ...
ENSG00000290203 15 68930504-69062743 + | ENSG00000290203 NOX5
ENSG00000290292 14 23272422-23299796 - | ENSG00000290292 HOMEZ
ENSG00000290320 17 32895433-32906586 + | ENSG00000290320 H2BN1
ENSG00000291237 6 159669069-159762529 - | ENSG00000291237 SOD2
ENSG00000274714 CHR_HSCHR19KIR_FH06_.. 54819131-54834528 + | ENSG00000274714 KIR2DS4
-------
seqinfo: 331 sequences from an unspecified genome; no seqlengths
Construction of the SE
(AERNAScomboSE <- SummarizedExperiment(assays = list(counts = as.matrix(combined_counts)),
colData = temp_coldat_clin,
rowRanges = combined_counts_rowRanges,
metadata = "Athero-Express RNAseq Study Combined: bulk RNA sequencing in carotid plaques accross two experiments, AERNAS1 (n=622) and AERNAS2 (n=471). Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected"))class: RangedSummarizedExperiment
dim: 21835 1093
metadata(1): ''
assays(1): counts
rownames(21835): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000274714
rowData names(2): feature_id symbol
colnames(1093): ae1 ae1026 ... ae986 ae992
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
* removing intermediate files ...
rm(temp_1coldat, temp2_coldat, temp_coldat_clin) # we don't delete 'temp_coldata_merge' because we need it later down the lineWarning: object 'temp_1coldat' not found
Do the study numbers correspond between metadata and expression data?
aernas1_counts$ENSEMBL_gene_ID <- NULL
aernas2_counts$ENSEMBL_gene_ID <- NULL
## check whether rownames metadata and colnames counts are identical
all(colnames(AERNAS1SE) == colnames(aernas1_counts))[1] TRUE
[1] TRUE
So, now we have raw counts for all patients included in the bulk RNAseq data, with all clinical data annotated to them. Some of the patients might be missing in certain variables:
# We know that some of the patients of the RNAseq is not included in some variables
which(is.na(AERNAS1SE$Gender))
missing_values_aernas1 <- which(is.na(AERNAS1SE$Gender))
missing_values_aernas1
which(is.na(AERNAS2SE$Gender))
missing_values_aernas2 <- which(is.na(AERNAS2SE$Gender))
missing_values_aernas2No need to remove missing samples based on a variable, since we will make a DESeq2 object using an empty model.
Showing the baseline table for the RNAseq data in 622 CEA patients with informed consent.
cat("====================================================================================================\n")====================================================================================================
SELECTION THE SHIZZLE
Warning: `as.tibble()` was deprecated in tibble 2.0.0.
Please use `as_tibble()` instead.
The signature and semantics have changed, see `?as_tibble`.
- sanity checking PRIOR to selection
library(data.table)
require(labelled)
ae.gender <- to_factor(AERNAS1SEClinData$Gender)
ae.hospital <- to_factor(AERNAS1SEClinData$Hospital)
table(ae.gender, ae.hospital, dnn = c("Sex", "Hospital"), useNA = "ifany") Hospital
Sex St. Antonius, Nieuwegein UMC Utrecht
female 99 55
male 259 209
ae.artery <- to_factor(AERNAS1SEClinData$Artery_summary)
table(ae.artery, ae.gender, dnn = c("Sex", "Artery"), useNA = "ifany") Artery
Sex female male
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0
carotid (left & right) 154 466
femoral/iliac (left, right or both sides) 0 0
other carotid arteries (common, external) 0 2
carotid bypass and injury (left, right or both sides) 0 0
aneurysmata (carotid & femoral) 0 0
aorta 0 0
other arteries (renal, popliteal, vertebral) 0 0
femoral bypass, angioseal and injury (left, right or both sides) 0 0
[1] 622 69
cat("===========================================================================================\n")===========================================================================================
CREATE BASELINE TABLE
# Create baseline tables
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
AERNAS1SEClinData.CEA.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
# strata = "Gender",
data = AERNAS1SEClinData, includeNA = TRUE),
nonnormal = c(),
quote = FALSE, showAllLevels = TRUE,
format = "p",
contDigits = 3)[,1:2]
level
n
Hospital (%) St. Antonius, Nieuwegein
UMC Utrecht
ORyear (%) No data available/missing
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Artery_summary (%) No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA
carotid (left & right)
femoral/iliac (left, right or both sides)
other carotid arteries (common, external)
carotid bypass and injury (left, right or both sides)
aneurysmata (carotid & femoral)
aorta
other arteries (renal, popliteal, vertebral)
femoral bypass, angioseal and injury (left, right or both sides)
Age (mean (SD))
Gender (%) female
male
TC_final (mean (SD))
LDL_final (mean (SD))
HDL_final (mean (SD))
TG_final (mean (SD))
systolic (mean (SD))
diastoli (mean (SD))
GFR_MDRD (mean (SD))
BMI (mean (SD))
KDOQI (%) No data available/missing
Normal kidney function
CKD 2 (Mild)
CKD 3 (Moderate)
CKD 4 (Severe)
CKD 5 (Failure)
<NA>
BMI_WHO (%) No data available/missing
Underweight
Normal
Overweight
Obese
<NA>
SmokerStatus (%) Current smoker
Ex-smoker
Never smoked
<NA>
AlcoholUse (%) No
Yes
<NA>
DiabetesStatus (%) Control (no Diabetes Dx/Med)
Diabetes
Hypertension.selfreport (%) No data available/missing
no
yes
<NA>
Hypertension.selfreportdrug (%) No data available/missing
no
yes
<NA>
Hypertension.composite (%) No data available/missing
no
yes
Hypertension.drugs (%) No data available/missing
no
yes
<NA>
Med.anticoagulants (%) No data available/missing
no
yes
<NA>
Med.all.antiplatelet (%) No data available/missing
no
yes
<NA>
Med.Statin.LLD (%) No data available/missing
no
yes
<NA>
Stroke_Dx (%) Missing
No stroke diagnosed
Stroke diagnosed
<NA>
sympt (%) missing
Asymptomatic
TIA
minor stroke
Major stroke
Amaurosis fugax
Four vessel disease
Vertebrobasilary TIA
Retinal infarction
Symptomatic, but aspecific symtoms
Contralateral symptomatic occlusion
retinal infarction
armclaudication due to occlusion subclavian artery, CEA needed for bypass
retinal infarction + TIAs
Ocular ischemic syndrome
ischemisch glaucoom
subclavian steal syndrome
TGA
<NA>
Symptoms.5G (%) Asymptomatic
Ocular
Other
Retinal infarction
Stroke
TIA
<NA>
AsymptSympt (%) Asymptomatic
Ocular and others
Symptomatic
<NA>
AsymptSympt2G (%) Asymptomatic
Symptomatic
<NA>
Symptoms.Update2G (%) Asymptomatic
Symptomatic
<NA>
Symptoms.Update3G (%) Asymptomatic
Symptomatic
Unclear
restenos (%) missing
de novo
restenosis
stenose bij angioseal na PTCA
<NA>
stenose (%) missing
0-49%
50-70%
70-90%
90-99%
100% (Occlusion)
NA
50-99%
70-99%
99
<NA>
CAD_history (%) Missing
No history CAD
History CAD
PAOD (%) missing/no data
no
yes
Peripheral.interv (%) no
yes
EP_composite (%) No data available.
No composite endpoints
Composite endpoints
<NA>
EP_composite_time (mean (SD))
epcom.3years (mean (SD))
EP_major (%) No data available.
No major events (endpoints)
Major events (endpoints)
<NA>
EP_major_time (mean (SD))
epmajor.3years (mean (SD))
MAC_rankNorm (mean (SD))
SMC_rankNorm (mean (SD))
Macrophages.bin (%) no/minor
moderate/heavy
<NA>
SMC.bin (%) no/minor
moderate/heavy
<NA>
Neutrophils_rankNorm (mean (SD))
MastCells_rankNorm (mean (SD))
IPH.bin (%) no
yes
<NA>
VesselDensity_rankNorm (mean (SD))
Calc.bin (%) no/minor
moderate/heavy
<NA>
Collagen.bin (%) no/minor
moderate/heavy
<NA>
Fat.bin_10 (%) <10%
>10%
<NA>
Fat.bin_40 (%) <40%
>40%
<NA>
OverallPlaquePhenotype (%) atheromatous
fibroatheromatous
fibrous
<NA>
Plaque_Vulnerability_Index (%) 0
1
2
3
4
5
PCSK9_plasma (mean (SD))
PCSK9_plasma_rankNorm (mean (SD))
Overall
n 622
Hospital (%) 57.6
42.4
ORyear (%) 0.0
5.0
9.8
10.6
13.2
13.7
10.8
10.1
10.9
5.5
5.0
3.5
0.8
0.5
0.5
0.2
0.0
0.0
0.0
0.0
0.0
0.0
Artery_summary (%) 0.0
99.7
0.0
0.3
0.0
0.0
0.0
0.0
0.0
Age (mean (SD)) 68.503 (8.898)
Gender (%) 24.8
75.2
TC_final (mean (SD)) 4.662 (1.253)
LDL_final (mean (SD)) 2.776 (1.042)
HDL_final (mean (SD)) 1.143 (0.374)
TG_final (mean (SD)) 1.609 (0.939)
systolic (mean (SD)) 154.375 (25.001)
diastoli (mean (SD)) 82.442 (13.443)
GFR_MDRD (mean (SD)) 73.004 (20.382)
BMI (mean (SD)) 26.608 (3.760)
KDOQI (%) 0.0
18.2
55.5
23.6
1.4
0.0
1.3
BMI_WHO (%) 0.0
0.8
33.3
46.3
14.5
5.1
SmokerStatus (%) 35.9
44.5
15.9
3.7
AlcoholUse (%) 34.1
61.3
4.7
DiabetesStatus (%) 78.5
21.5
Hypertension.selfreport (%) 0.0
27.0
70.9
2.1
Hypertension.selfreportdrug (%) 0.0
33.3
64.3
2.4
Hypertension.composite (%) 0.0
13.0
87.0
Hypertension.drugs (%) 0.0
22.5
77.3
0.2
Med.anticoagulants (%) 0.0
87.6
12.2
0.2
Med.all.antiplatelet (%) 0.0
10.6
89.2
0.2
Med.Statin.LLD (%) 0.0
24.3
75.6
0.2
Stroke_Dx (%) 0.0
75.7
17.7
6.6
sympt (%) 0.0
12.9
40.4
15.0
9.2
15.6
1.9
0.2
1.4
2.6
0.5
0.2
0.0
0.0
0.2
0.0
0.0
0.0
0.2
Symptoms.5G (%) 12.9
15.8
5.0
1.6
24.1
40.5
0.2
AsymptSympt (%) 12.9
22.3
64.6
0.2
AsymptSympt2G (%) 12.9
87.0
0.2
Symptoms.Update2G (%) 26.8
68.8
4.3
Symptoms.Update3G (%) 26.8
68.8
4.3
restenos (%) 0.0
95.8
1.8
0.0
2.4
stenose (%) 0.0
0.3
6.1
43.4
45.7
0.8
0.0
0.2
0.0
0.0
3.5
CAD_history (%) 0.0
66.6
33.4
PAOD (%) 0.0
79.3
20.7
Peripheral.interv (%) 84.7
15.3
EP_composite (%) 0.0
74.3
25.2
0.5
EP_composite_time (mean (SD)) 2.649 (1.148)
epcom.3years (mean (SD)) 0.236 (0.425)
EP_major (%) 0.0
86.2
13.3
0.5
EP_major_time (mean (SD)) 2.852 (1.018)
epmajor.3years (mean (SD)) 0.129 (0.336)
MAC_rankNorm (mean (SD)) 0.285 (0.955)
SMC_rankNorm (mean (SD)) -0.035 (0.926)
Macrophages.bin (%) 42.6
56.3
1.1
SMC.bin (%) 31.2
67.7
1.1
Neutrophils_rankNorm (mean (SD)) 0.256 (1.020)
MastCells_rankNorm (mean (SD)) -0.003 (1.035)
IPH.bin (%) 38.4
60.8
0.8
VesselDensity_rankNorm (mean (SD)) 0.140 (0.945)
Calc.bin (%) 46.3
53.1
0.6
Collagen.bin (%) 19.0
79.6
1.4
Fat.bin_10 (%) 23.2
76.2
0.6
Fat.bin_40 (%) 69.5
29.9
0.6
OverallPlaquePhenotype (%) 30.1
38.1
31.0
0.8
Plaque_Vulnerability_Index (%) 6.8
16.6
26.0
33.3
11.9
5.5
PCSK9_plasma (mean (SD)) 32025.874 (18936.193)
PCSK9_plasma_rankNorm (mean (SD)) -0.004 (1.015)
Showing the baseline table for the RNAseq data in 471 CEA patients with informed consent.
cat("====================================================================================================\n")====================================================================================================
SELECTION THE SHIZZLE
- sanity checking PRIOR to selection
library(data.table)
require(labelled)
ae.gender <- to_factor(AERNAS2SEClinData$Gender)
ae.hospital <- to_factor(AERNAS2SEClinData$Hospital)
table(ae.gender, ae.hospital, dnn = c("Sex", "Hospital"), useNA = "ifany") Hospital
Sex St. Antonius, Nieuwegein UMC Utrecht
female 54 101
male 92 224
ae.artery <- to_factor(AERNAS2SEClinData$Artery_summary)
table(ae.artery, ae.gender, dnn = c("Sex", "Artery"), useNA = "ifany") Artery
Sex female male
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0
carotid (left & right) 155 314
femoral/iliac (left, right or both sides) 0 0
other carotid arteries (common, external) 0 2
carotid bypass and injury (left, right or both sides) 0 0
aneurysmata (carotid & femoral) 0 0
aorta 0 0
other arteries (renal, popliteal, vertebral) 0 0
femoral bypass, angioseal and injury (left, right or both sides) 0 0
[1] 471 69
cat("===========================================================================================\n")===========================================================================================
CREATE BASELINE TABLE
# Create baseline tables
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
AERNAS2SEClinData.CEA.tableOne = print(CreateTableOne(vars = basetable_vars,
# factorVars = basetable_bin,
# strata = "Gender",
data = AERNAS2SEClinData, includeNA = TRUE),
nonnormal = c(),
quote = FALSE, showAllLevels = TRUE,
format = "p",
contDigits = 3)[,1:2]
level
n
Hospital (%) St. Antonius, Nieuwegein
UMC Utrecht
ORyear (%) No data available/missing
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Artery_summary (%) No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA
carotid (left & right)
femoral/iliac (left, right or both sides)
other carotid arteries (common, external)
carotid bypass and injury (left, right or both sides)
aneurysmata (carotid & femoral)
aorta
other arteries (renal, popliteal, vertebral)
femoral bypass, angioseal and injury (left, right or both sides)
Age (mean (SD))
Gender (%) female
male
TC_final (mean (SD))
LDL_final (mean (SD))
HDL_final (mean (SD))
TG_final (mean (SD))
systolic (mean (SD))
diastoli (mean (SD))
GFR_MDRD (mean (SD))
BMI (mean (SD))
KDOQI (%) No data available/missing
Normal kidney function
CKD 2 (Mild)
CKD 3 (Moderate)
CKD 4 (Severe)
CKD 5 (Failure)
<NA>
BMI_WHO (%) No data available/missing
Underweight
Normal
Overweight
Obese
<NA>
SmokerStatus (%) Current smoker
Ex-smoker
Never smoked
<NA>
AlcoholUse (%) No
Yes
<NA>
DiabetesStatus (%) Control (no Diabetes Dx/Med)
Diabetes
Hypertension.selfreport (%) No data available/missing
no
yes
<NA>
Hypertension.selfreportdrug (%) No data available/missing
no
yes
<NA>
Hypertension.composite (%) No data available/missing
no
yes
Hypertension.drugs (%) No data available/missing
no
yes
Med.anticoagulants (%) No data available/missing
no
yes
Med.all.antiplatelet (%) No data available/missing
no
yes
Med.Statin.LLD (%) No data available/missing
no
yes
Stroke_Dx (%) Missing
No stroke diagnosed
Stroke diagnosed
<NA>
sympt (%) missing
Asymptomatic
TIA
minor stroke
Major stroke
Amaurosis fugax
Four vessel disease
Vertebrobasilary TIA
Retinal infarction
Symptomatic, but aspecific symtoms
Contralateral symptomatic occlusion
retinal infarction
armclaudication due to occlusion subclavian artery, CEA needed for bypass
retinal infarction + TIAs
Ocular ischemic syndrome
ischemisch glaucoom
subclavian steal syndrome
TGA
Symptoms.5G (%) Asymptomatic
Ocular
Other
Retinal infarction
Stroke
TIA
AsymptSympt (%) Asymptomatic
Ocular and others
Symptomatic
AsymptSympt2G (%) Asymptomatic
Symptomatic
Symptoms.Update2G (%) Asymptomatic
Symptomatic
<NA>
Symptoms.Update3G (%) Asymptomatic
Symptomatic
Unclear
<NA>
restenos (%) missing
de novo
restenosis
stenose bij angioseal na PTCA
<NA>
stenose (%) missing
0-49%
50-70%
70-90%
90-99%
100% (Occlusion)
NA
50-99%
70-99%
99
<NA>
CAD_history (%) Missing
No history CAD
History CAD
<NA>
PAOD (%) missing/no data
no
yes
<NA>
Peripheral.interv (%) no
yes
<NA>
EP_composite (%) No data available.
No composite endpoints
Composite endpoints
<NA>
EP_composite_time (mean (SD))
epcom.3years (mean (SD))
EP_major (%) No data available.
No major events (endpoints)
Major events (endpoints)
<NA>
EP_major_time (mean (SD))
epmajor.3years (mean (SD))
MAC_rankNorm (mean (SD))
SMC_rankNorm (mean (SD))
Macrophages.bin (%) no/minor
moderate/heavy
<NA>
SMC.bin (%) no/minor
moderate/heavy
<NA>
Neutrophils_rankNorm (mean (SD))
MastCells_rankNorm (mean (SD))
IPH.bin (%) no
yes
<NA>
VesselDensity_rankNorm (mean (SD))
Calc.bin (%) no/minor
moderate/heavy
<NA>
Collagen.bin (%) no/minor
moderate/heavy
<NA>
Fat.bin_10 (%) <10%
>10%
<NA>
Fat.bin_40 (%) <40%
>40%
<NA>
OverallPlaquePhenotype (%) atheromatous
fibroatheromatous
fibrous
<NA>
Plaque_Vulnerability_Index (%) 0
1
2
3
4
5
PCSK9_plasma (mean (SD))
PCSK9_plasma_rankNorm (mean (SD))
Overall
n 471
Hospital (%) 31.0
69.0
ORyear (%) 0.0
1.5
1.3
2.3
3.2
4.2
6.4
6.6
7.9
8.7
5.9
9.6
13.8
14.6
5.7
5.7
2.5
0.0
0.0
0.0
0.0
0.0
Artery_summary (%) 0.0
99.6
0.0
0.4
0.0
0.0
0.0
0.0
0.0
Age (mean (SD)) 70.346 (8.768)
Gender (%) 32.9
67.1
TC_final (mean (SD)) 4.760 (1.255)
LDL_final (mean (SD)) 2.774 (1.079)
HDL_final (mean (SD)) 1.235 (0.426)
TG_final (mean (SD)) 1.527 (0.842)
systolic (mean (SD)) 150.113 (23.711)
diastoli (mean (SD)) 80.582 (36.520)
GFR_MDRD (mean (SD)) 72.513 (20.318)
BMI (mean (SD)) 26.211 (3.955)
KDOQI (%) 0.0
18.9
52.0
23.4
1.3
0.2
4.2
BMI_WHO (%) 0.0
0.8
39.1
43.5
14.0
2.5
SmokerStatus (%) 32.1
49.5
13.8
4.7
AlcoholUse (%) 36.7
61.1
2.1
DiabetesStatus (%) 74.9
25.1
Hypertension.selfreport (%) 0.0
23.8
72.6
3.6
Hypertension.selfreportdrug (%) 0.0
30.8
63.7
5.5
Hypertension.composite (%) 0.0
15.9
84.1
Hypertension.drugs (%) 0.0
24.8
75.2
Med.anticoagulants (%) 0.0
89.0
11.0
Med.all.antiplatelet (%) 0.0
13.6
86.4
Med.Statin.LLD (%) 0.0
18.5
81.5
Stroke_Dx (%) 0.0
70.1
25.3
4.7
sympt (%) 0.0
6.6
40.1
21.0
7.0
17.4
0.8
0.4
2.3
3.0
0.4
0.2
0.0
0.0
0.4
0.0
0.2
0.0
Symptoms.5G (%) 6.6
17.8
4.5
2.5
28.0
40.6
AsymptSympt (%) 6.6
24.8
68.6
AsymptSympt2G (%) 6.6
93.4
Symptoms.Update2G (%) 30.8
64.8
4.5
Symptoms.Update3G (%) 30.8
64.8
1.3
3.2
restenos (%) 0.0
95.8
2.8
0.0
1.5
stenose (%) 0.0
0.6
7.4
50.7
32.9
1.1
0.0
0.6
4.9
0.0
1.7
CAD_history (%) 0.0
70.7
29.1
0.2
PAOD (%) 0.0
84.5
15.3
0.2
Peripheral.interv (%) 83.2
16.1
0.6
EP_composite (%) 0.0
76.2
22.1
1.7
EP_composite_time (mean (SD)) 2.562 (1.066)
epcom.3years (mean (SD)) 0.207 (0.406)
EP_major (%) 0.0
86.8
11.5
1.7
EP_major_time (mean (SD)) 2.756 (0.954)
epmajor.3years (mean (SD)) 0.112 (0.316)
MAC_rankNorm (mean (SD)) 0.154 (0.913)
SMC_rankNorm (mean (SD)) -0.127 (0.903)
Macrophages.bin (%) 34.0
46.3
19.7
SMC.bin (%) 31.0
49.5
19.5
Neutrophils_rankNorm (mean (SD)) 0.045 (0.921)
MastCells_rankNorm (mean (SD)) -0.110 (0.951)
IPH.bin (%) 34.4
47.6
18.0
VesselDensity_rankNorm (mean (SD)) -0.025 (0.992)
Calc.bin (%) 54.4
32.3
13.4
Collagen.bin (%) 15.7
52.7
31.6
Fat.bin_10 (%) 25.1
61.6
13.4
Fat.bin_40 (%) 62.6
24.0
13.4
OverallPlaquePhenotype (%) 22.7
31.8
31.4
14.0
Plaque_Vulnerability_Index (%) 21.0
20.2
17.0
23.1
14.9
3.8
PCSK9_plasma (mean (SD)) 32239.086 (18567.336)
PCSK9_plasma_rankNorm (mean (SD)) 0.010 (1.031)
Showing the baseline table for the RNAseq data in 1,093 CEA patients in AERNAS1 and AERNAS2 combined with informed consent.
cat("====================================================================================================\n")====================================================================================================
SELECTION THE SHIZZLE
AERNAScomboSEClinData <- as.tibble(colData(AERNAScomboSE))
cat("- sanity checking PRIOR to selection")- sanity checking PRIOR to selection
library(data.table)
require(labelled)
ae.gender <- to_factor(AERNAScomboSEClinData$Gender)
ae.hospital <- to_factor(AERNAScomboSEClinData$Hospital)
table(ae.gender, ae.hospital, dnn = c("Sex", "Hospital"), useNA = "ifany") Hospital
Sex St. Antonius, Nieuwegein UMC Utrecht
female 153 156
male 351 433
ae.artery <- to_factor(AERNAScomboSEClinData$Artery_summary)
table(ae.artery, ae.gender, dnn = c("Sex", "Artery"), useNA = "ifany") Artery
Sex female male
No artery known (yet), no surgery (patient ill, died, exited study), re-numbered to AAA 0 0
carotid (left & right) 309 780
femoral/iliac (left, right or both sides) 0 0
other carotid arteries (common, external) 0 4
carotid bypass and injury (left, right or both sides) 0 0
aneurysmata (carotid & femoral) 0 0
aorta 0 0
other arteries (renal, popliteal, vertebral) 0 0
femoral bypass, angioseal and injury (left, right or both sides) 0 0
rm(ae.gender, ae.hospital, ae.artery)
# AERNAScomboSEClinData[1:10, 1:10]
dim(AERNAScomboSEClinData)[1] 1093 69
cat("===========================================================================================\n")===========================================================================================
CREATE BASELINE TABLE
# Create baseline tables
require(labelled)
AERNAScomboSEClinData$SampleType <- to_factor(AERNAScomboSEClinData$SampleType)
AERNAScomboSEClinData$RNAseqTech <- to_factor(AERNAScomboSEClinData$RNAseqTech)
AERNAScomboSEClinData$RNAseqType <- to_factor(AERNAScomboSEClinData$RNAseqType)
AERNAScomboSEClinData$RNAseqQC <- to_factor(AERNAScomboSEClinData$RNAseqQC)
AERNAScomboSEClinData$StudyType <- to_factor(AERNAScomboSEClinData$StudyType)
AERNAScomboSEClinData$StudyName <- to_factor(AERNAScomboSEClinData$StudyName)
AERNAScomboSEClinData$StudyBiobank <- to_factor(AERNAScomboSEClinData$StudyBiobank)
AERNAScomboSEClinData$SampleSize <- to_factor(AERNAScomboSEClinData$SampleSize)
AERNAScomboSEClinData$InformedConsent <- to_factor(AERNAScomboSEClinData$InformedConsent)
# http://rstudio-pubs-static.s3.amazonaws.com/13321_da314633db924dc78986a850813a50d5.html
AERNAScomboSEClinData.CEA.tableOne = print(CreateTableOne(vars =
basetable_vars,
factorVars = basetable_bin,
strata = "StudyName",
data = AERNAScomboSEClinData, includeNA = TRUE),
nonnormal = c(),
quote = FALSE, showAllLevels = TRUE,
format = "p",
contDigits = 3)[,1:5] Stratified by StudyName
level AERNAS1 AERNAS2 p test
n 622 471
Hospital (%) St. Antonius, Nieuwegein 57.6 31.0 <0.001
UMC Utrecht 42.4 69.0
ORyear (%) No data available/missing 0.0 0.0 NaN
2002 5.0 1.5
2003 9.8 1.3
2004 10.6 2.3
2005 13.2 3.2
2006 13.7 4.2
2007 10.8 6.4
2008 10.1 6.6
2009 10.9 7.9
2010 5.5 8.7
2011 5.0 5.9
2012 3.5 9.6
2013 0.8 13.8
2014 0.5 14.6
2015 0.5 5.7
2016 0.2 5.7
2017 0.0 2.5
2018 0.0 0.0
2019 0.0 0.0
2020 0.0 0.0
2021 0.0 0.0
2022 0.0 0.0
Artery_summary (%) carotid (left & right) 99.7 99.6 1.000
other carotid arteries (common, external) 0.3 0.4
Age (mean (SD)) 68.503 (8.898) 70.346 (8.768) 0.001
Gender (%) female 24.8 32.9 0.004
male 75.2 67.1
TC_final (mean (SD)) 4.662 (1.253) 4.760 (1.255) 0.307
LDL_final (mean (SD)) 2.776 (1.042) 2.774 (1.079) 0.983
HDL_final (mean (SD)) 1.143 (0.374) 1.235 (0.426) 0.003
TG_final (mean (SD)) 1.609 (0.939) 1.527 (0.842) 0.256
systolic (mean (SD)) 154.375 (25.001) 150.113 (23.711) 0.008
diastoli (mean (SD)) 82.442 (13.443) 80.582 (36.520) 0.281
GFR_MDRD (mean (SD)) 73.004 (20.382) 72.513 (20.318) 0.697
BMI (mean (SD)) 26.608 (3.760) 26.211 (3.955) 0.097
KDOQI (%) Normal kidney function 18.2 18.9 0.047
CKD 2 (Mild) 55.5 52.0
CKD 3 (Moderate) 23.6 23.4
CKD 4 (Severe) 1.4 1.3
CKD 5 (Failure) 0.0 0.2
<NA> 1.3 4.2
BMI_WHO (%) Underweight 0.8 0.8 0.112
Normal 33.3 39.1
Overweight 46.3 43.5
Obese 14.5 14.0
<NA> 5.1 2.5
SmokerStatus (%) Current smoker 35.9 32.1 0.268
Ex-smoker 44.5 49.5
Never smoked 15.9 13.8
<NA> 3.7 4.7
AlcoholUse (%) No 34.1 36.7 0.068
Yes 61.3 61.1
<NA> 4.7 2.1
DiabetesStatus (%) Control (no Diabetes Dx/Med) 78.5 74.9 0.196
Diabetes 21.5 25.1
Hypertension.selfreport (%) no 27.0 23.8 0.178
yes 70.9 72.6
<NA> 2.1 3.6
Hypertension.selfreportdrug (%) no 33.3 30.8 0.024
yes 64.3 63.7
<NA> 2.4 5.5
Hypertension.composite (%) no 13.0 15.9 0.204
yes 87.0 84.1
Hypertension.drugs (%) no 22.5 24.8 0.462
yes 77.3 75.2
<NA> 0.2 0.0
Med.anticoagulants (%) no 87.6 89.0 0.569
yes 12.2 11.0
<NA> 0.2 0.0
Med.all.antiplatelet (%) no 10.6 13.6 0.224
yes 89.2 86.4
<NA> 0.2 0.0
Med.Statin.LLD (%) no 24.3 18.5 0.047
yes 75.6 81.5
<NA> 0.2 0.0
Stroke_Dx (%) No stroke diagnosed 75.7 70.1 0.006
Stroke diagnosed 17.7 25.3
<NA> 6.6 4.7
sympt (%) Asymptomatic 12.9 6.6 0.023
TIA 40.4 40.1
minor stroke 15.0 21.0
Major stroke 9.2 7.0
Amaurosis fugax 15.6 17.4
Four vessel disease 1.9 0.8
Vertebrobasilary TIA 0.2 0.4
Retinal infarction 1.4 2.3
Symptomatic, but aspecific symtoms 2.6 3.0
Contralateral symptomatic occlusion 0.5 0.4
retinal infarction 0.2 0.2
Ocular ischemic syndrome 0.2 0.4
subclavian steal syndrome 0.0 0.2
<NA> 0.2 0.0
Symptoms.5G (%) Asymptomatic 12.9 6.6 0.022
Ocular 15.8 17.8
Other 5.0 4.5
Retinal infarction 1.6 2.5
Stroke 24.1 28.0
TIA 40.5 40.6
<NA> 0.2 0.0
AsymptSympt (%) Asymptomatic 12.9 6.6 0.006
Ocular and others 22.3 24.8
Symptomatic 64.6 68.6
<NA> 0.2 0.0
AsymptSympt2G (%) Asymptomatic 12.9 6.6 0.002
Symptomatic 87.0 93.4
<NA> 0.2 0.0
Symptoms.Update2G (%) Asymptomatic 26.8 30.8 0.346
Symptomatic 68.8 64.8
<NA> 4.3 4.5
Symptoms.Update3G (%) Asymptomatic 26.8 30.8 <0.001
Symptomatic 68.8 64.8
Unclear 4.3 1.3
<NA> 0.0 3.2
restenos (%) de novo 95.8 95.8 0.310
restenosis 1.8 2.8
<NA> 2.4 1.5
stenose (%) 0-49% 0.3 0.6 <0.001
50-70% 6.1 7.4
70-90% 43.4 50.7
90-99% 45.7 32.9
100% (Occlusion) 0.8 1.1
50-99% 0.2 0.6
70-99% 0.0 4.9
<NA> 3.5 1.7
CAD_history (%) No history CAD 66.6 70.7 0.165
History CAD 33.4 29.1
<NA> 0.0 0.2
PAOD (%) no 79.3 84.5 0.038
yes 20.7 15.3
<NA> 0.0 0.2
Peripheral.interv (%) no 84.7 83.2 0.125
yes 15.3 16.1
<NA> 0.0 0.6
EP_composite (%) No composite endpoints 74.3 76.2 0.074
Composite endpoints 25.2 22.1
<NA> 0.5 1.7
EP_composite_time (mean (SD)) 2.649 (1.148) 2.562 (1.066) 0.202
epcom.3years (mean (SD)) 0.236 (0.425) 0.207 (0.406) 0.266
EP_major (%) No data available. 0.0 0.0 NaN
No major events (endpoints) 86.2 86.8
Major events (endpoints) 13.3 11.5
<NA> 0.5 1.7
EP_major_time (mean (SD)) 2.852 (1.018) 2.756 (0.954) 0.117
epmajor.3years (mean (SD)) 0.129 (0.336) 0.112 (0.316) 0.400
MAC_rankNorm (mean (SD)) 0.285 (0.955) 0.154 (0.913) 0.060
SMC_rankNorm (mean (SD)) -0.035 (0.926) -0.127 (0.903) 0.172
Macrophages.bin (%) no/minor 42.6 34.0 <0.001
moderate/heavy 56.3 46.3
<NA> 1.1 19.7
SMC.bin (%) no/minor 31.2 31.0 <0.001
moderate/heavy 67.7 49.5
<NA> 1.1 19.5
Neutrophils_rankNorm (mean (SD)) 0.256 (1.020) 0.045 (0.921) 0.312
MastCells_rankNorm (mean (SD)) -0.003 (1.035) -0.110 (0.951) 0.640
IPH.bin (%) no 38.4 34.4 <0.001
yes 60.8 47.6
<NA> 0.8 18.0
VesselDensity_rankNorm (mean (SD)) 0.140 (0.945) -0.025 (0.992) 0.024
Calc.bin (%) no/minor 46.3 54.4 <0.001
moderate/heavy 53.1 32.3
<NA> 0.6 13.4
Collagen.bin (%) no/minor 19.0 15.7 <0.001
moderate/heavy 79.6 52.7
<NA> 1.4 31.6
Fat.bin_10 (%) <10% 23.2 25.1 <0.001
>10% 76.2 61.6
<NA> 0.6 13.4
Fat.bin_40 (%) <40% 69.5 62.6 <0.001
>40% 29.9 24.0
<NA> 0.6 13.4
OverallPlaquePhenotype (%) atheromatous 30.1 22.7 <0.001
fibroatheromatous 38.1 31.8
fibrous 31.0 31.4
<NA> 0.8 14.0
Plaque_Vulnerability_Index (%) 0 6.8 21.0 <0.001
1 16.6 20.2
2 26.0 17.0
3 33.3 23.1
4 11.9 14.9
5 5.5 3.8
PCSK9_plasma (mean (SD)) 32025.874 (18936.193) 32239.086 (18567.336) 0.885
PCSK9_plasma_rankNorm (mean (SD)) -0.004 (1.015) 0.010 (1.031) 0.863
Writing the baseline tables to Excel format.
# Write basetable
require(openxlsx)
# write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AERNAS1.CEA.608pts.after_qc.IC_commercial.BaselineTable.xlsx"),
# format(AERNAS1SEClinData.CEA.tableOne, digits = 5, scientific = FALSE) ,
# rowNames = TRUE, colNames = TRUE, overwrite = TRUE)
#
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AERNAS1.CEA.622pts.after_qc.IC_academic.BaselineTable.xlsx"),
format(as.data.frame(AERNAS1SEClinData.CEA.tableOne), digits = 5, scientific = FALSE) ,
rowNames = TRUE, colNames = TRUE, overwrite = TRUE)# Write basetable
require(openxlsx)
# write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AERNAS2.CEA.608pts.after_qc.IC_commercial.BaselineTable.xlsx"),
# format(AERNAS2SEClinData.CEA.tableOne, digits = 5, scientific = FALSE) ,
# rowNames = TRUE, colNames = TRUE, overwrite = TRUE)
#
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AERNAS2.CEA.622pts.after_qc.IC_academic.BaselineTable.xlsx"),
format(as.data.frame(AERNAS2SEClinData.CEA.tableOne), digits = 5, scientific = FALSE) ,
rowNames = TRUE, colNames = TRUE, overwrite = TRUE)# Write basetable
require(openxlsx)
# write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AERNASCombo.CEA.1093pts.after_qc.IC_commercial.BaselineTable.xlsx"),
# format(AERNAScomboSEClinData.CEA.tableOne, digits = 5, scientific = FALSE) ,
# rowNames = TRUE, colNames = TRUE, overwrite = TRUE)
#
write.xlsx(file = paste0(BASELINE_loc, "/",Today,".",PROJECTNAME,".AERNASCombo.CEA.1093pts.after_qc.IC_academic.BaselineTable.xlsx"),
format(as.data.frame(AERNAScomboSEClinData.CEA.tableOne), digits = 5, scientific = FALSE) ,
rowNames = TRUE, colNames = TRUE, overwrite = TRUE)From here we can analyze whether specific genes differ between groups, or do this for the entire gene set as part of DE analysis, and then select our genes of interest. Let’s start with the latter
The dds raw counts need normalization and log transformation first.
AERNA1dds <- DESeqDataSet(AERNAS1SE, design = ~ 1)
# Determine the size factors to use for normalization
AERNA1dds <- estimateSizeFactors(AERNA1dds)
# sizeFactors(AERNA1dds)
# Extract the normalized counts
normalized_counts <- counts(AERNA1dds, normalized = TRUE)
# head(normalized_counts)
# Log transform counts for QC
AERNA1vsd <- vst(AERNA1dds, blind = TRUE)
# There is a message stating the following.
#
# -- note: fitType='parametric', but the dispersion trend was not well captured by the
# function: y = a/x + b, and a local regression fit was automatically substituted.
# specify fitType='local' or 'mean' to avoid this message next time.
#
# No action is required.
#
# For more information check: https://www.biostars.org/p/119115/We will create a list of samples that should be included based on
CEA, and having the proper informed consent (‘academic’). We will save
the SummarizedExperiment as a RDS file for easy loading.
The clinical data will also be saved as a separate
txt-file.
> meta data
temp_coldat <- data.frame(STUDY_NUMBER = names(aernas1_counts_raw_qc_umicorr_annotFilt[,10:631]),
SampleType = "plaque", RNAseqTech = "CEL2-seq", RNAseqType = "3' RNAseq", RNAseqQC = "UMI-corrected",
StudyType = "CEA", StudyName = "AERNAS1", StudyBiobank = "Athero-Express Biobank Study (AE)", SampleSize = "622",
InformedConsent = "ACADEMIC",
row.names = names(aernas1_counts_raw_qc_umicorr_annotFilt[,10:631]))
cat(" > clinical data\n") > clinical data
# bulkRNA_meta_clin_COMMERCIAL <- subset(bulkRNA_meta_clin, select = c("study_number", basetable_vars))
aernas1_meta_clin_ACADEMIC <- subset(aernas1_meta_clin, select = c("study_number", basetable_vars))
# temp_coldat_clin <- merge(temp_coldat, bulkRNA_meta_clin_COMMERCIAL, by.x = "STUDY_NUMBER", by.y = "study_number", sort = FALSE, all.x = TRUE)
temp_coldat_clin <- merge(temp_coldat, aernas1_meta_clin_ACADEMIC, by.x = "STUDY_NUMBER", by.y = "study_number", sort = FALSE, all.x = TRUE)
rownames(temp_coldat_clin) <- temp_coldat_clin$STUDY_NUMBER
dim(temp_coldat_clin)[1] 622 69
temp <- as.tibble(subset(colData(AERNAS1SE), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1.CEA.622pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAS1SE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1.CEA.622pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(assay(AERNAS1SE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1.CEA.622pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAS1SE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1.CEA.622pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Applied size correction before normalization.
(AERNAS1SEnorm <- SummarizedExperiment(assays = list(counts = normalized_counts),
colData = temp_coldat_clin,
rowRanges = aernas1_bulkRNA_rowRanges,
metadata = "Athero-Express RNA Study 1: bulk RNA sequencing of carotid plaques. Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected. Size corrected normalization."))class: RangedSummarizedExperiment
dim: 21835 622
metadata(1): ''
assays(1): counts
rownames(21835): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000274714
rowData names(2): feature_id symbol
colnames(622): ae1 ae1026 ... ae998 ae999
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
temp <- as.tibble(subset(colData(AERNAS1SEnorm), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.608pts.samplelist.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
#
# temp <- as.tibble(colData(AERNA1SE))
#
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.608pts.clinicaldata.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.622pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAS1SEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.622pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)temp <- as_tibble(assay(AERNAS1SEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.622pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAS1SEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.622pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Log-transform the counts using vst.
(AERNAS1SEvst <- SummarizedExperiment(assays = list(counts = assay(AERNA1vsd)),
colData = temp_coldat_clin,
rowRanges = aernas1_bulkRNA_rowRanges,
metadata = "Athero-Express RNA Study 1: bulk RNA sequencing of carotid plaques. Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected. Size corrected normalization. log-transformed."))class: RangedSummarizedExperiment
dim: 21835 622
metadata(1): ''
assays(1): counts
rownames(21835): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000274714
rowData names(2): feature_id symbol
colnames(622): ae1 ae1026 ... ae998 ae999
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
temp <- as.tibble(subset(colData(AERNAS1SEvst), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.608pts.samplelist.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
#
# temp <- as.tibble(colData(AERNA1SE))
#
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.608pts.clinicaldata.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.622pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAS1SEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.622pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)temp <- as_tibble(assay(AERNAS1SEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.622pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAS1SEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.622pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)AERNA2dds <- DESeqDataSet(AERNAS2SE, design = ~ 1)
# Determine the size factors to use for normalization
AERNA2dds <- estimateSizeFactors(AERNA2dds)
# sizeFactors(AERNA2dds)
# Extract the normalized counts
normalized_counts <- counts(AERNA2dds, normalized = TRUE)
# head(normalized_counts)
# Log transform counts for QC
AERNA2vsd <- vst(AERNA2dds, blind = TRUE)
# There is a message stating the following.
#
# -- note: fitType='parametric', but the dispersion trend was not well captured by the
# function: y = a/x + b, and a local regression fit was automatically substituted.
# specify fitType='local' or 'mean' to avoid this message next time.
#
# No action is required.
#
# For more information check: https://www.biostars.org/p/119115/We will create a list of samples that should be included based on
CEA, and having the proper informed consent (‘academic’). We will save
the SummarizedExperiment as a RDS file for easy loading.
The clinical data will also be saved as a separate
txt-file.
> meta data
temp_coldat <- data.frame(STUDY_NUMBER = names(aernas2_counts_raw_qc_umicorr_annotFilt[,10:480]),
SampleType = "plaque", RNAseqTech = "CEL2-seq", RNAseqType = "3' RNAseq", RNAseqQC = "UMI-corrected",
StudyType = "CEA", StudyName = "AERNAS2", StudyBiobank = "Athero-Express Biobank Study (AE)", SampleSize = "622",
InformedConsent = "ACADEMIC",
row.names = names(aernas2_counts_raw_qc_umicorr_annotFilt[,10:480]))
cat(" > clinical data\n") > clinical data
# aernas2_meta_clin_COMMERCIAL <- subset(aernas2_meta_clin, select = c("STUDY_NUMBER", basetable_vars))
aernas2_meta_clin_ACADEMIC <- subset(aernas2_meta_clin, select = c("STUDY_NUMBER", basetable_vars))
# temp_coldat_clin <- merge(temp_coldat, aernas2_meta_clin_COMMERCIAL, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", sort = FALSE, all.x = TRUE)
temp_coldat_clin <- merge(temp_coldat, aernas2_meta_clin_ACADEMIC, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", sort = FALSE, all.x = TRUE)
rownames(temp_coldat_clin) <- temp_coldat_clin$STUDY_NUMBER
dim(temp_coldat_clin)[1] 471 69
temp <- as.tibble(subset(colData(AERNAS2SE), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2.CEA.471pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAS2SE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2.CEA.471pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(assay(AERNAS2SE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2.CEA.471pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAS2SE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2.CEA.471pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Applied size correction before normalization.
(AERNAS2SEnorm <- SummarizedExperiment(assays = list(counts = normalized_counts),
colData = temp_coldat_clin,
rowRanges = aernas2_bulkRNA_rowRanges,
metadata = "Athero-Express RNA Study 2: bulk RNA sequencing of carotid plaques. Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected. Size corrected normalization."))class: RangedSummarizedExperiment
dim: 21843 471
metadata(1): ''
assays(1): counts
rownames(21843): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000281861
rowData names(2): feature_id symbol
colnames(471): ae105 ae1078 ... ae986 ae992
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
temp <- as.tibble(subset(colData(AERNAS2SEnorm), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.xxxpts.samplelist.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
#
# temp <- as.tibble(colData(AERNA1SE))
#
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.xxxpts.clinicaldata.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.471pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAS2SEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.471pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)temp <- as_tibble(assay(AERNAS2SEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.471pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAS2SEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.471pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Log-transform the counts using vst.
(AERNAS2SEvst <- SummarizedExperiment(assays = list(counts = assay(AERNA2vsd)),
colData = temp_coldat_clin,
rowRanges = aernas2_bulkRNA_rowRanges,
metadata = "Athero-Express RNA Study 2: bulk RNA sequencing of carotid plaques. Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected. Size corrected normalization. log-transformed."))class: RangedSummarizedExperiment
dim: 21843 471
metadata(1): ''
assays(1): counts
rownames(21843): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000281861
rowData names(2): feature_id symbol
colnames(471): ae105 ae1078 ... ae986 ae992
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
temp <- as.tibble(subset(colData(AERNAS2SEvst), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.xxxpts.samplelist.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
#
# temp <- as.tibble(colData(AERNA1SE))
#
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.xxxpts.clinicaldata.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.471pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAS2SEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.471pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)temp <- as_tibble(assay(AERNAS2SEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.471pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAS2SEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.471pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)AERNAScombodds <- DESeqDataSet(AERNAScomboSE, design = ~ 1)
# Determine the size factors to use for normalization
AERNAScombodds <- estimateSizeFactors(AERNAScombodds)
# sizeFactors(AERNAScombodds)
# Extract the normalized counts
normalized_counts <- counts(AERNAScombodds, normalized = TRUE)
# head(normalized_counts)
# Log transform counts for QC
AERNAScombovsd <- vst(AERNAScombodds, blind = TRUE)
# There is a message stating the following.
#
# -- note: fitType='parametric', but the dispersion trend was not well captured by the
# function: y = a/x + b, and a local regression fit was automatically substituted.
# specify fitType='local' or 'mean' to avoid this message next time.
#
# No action is required.
#
# For more information check: https://www.biostars.org/p/119115/We will create a list of samples that should be included based on
CEA, and having the proper informed consent (‘academic’). We will save
the SummarizedExperiment as a RDS file for easy loading.
The clinical data will also be saved as a separate
txt-file.
We grep the meta- and clinical data from the
SummarizedExperiment.
[1] 1093 69
temp <- as.tibble(subset(colData(AERNAScomboSE), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScombo.CEA.1093pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAScomboSE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScombo.CEA.1093pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(assay(AERNAScomboSE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScombo.CEA.1093pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAScomboSE))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScombo.CEA.1093pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Applied size correction before normalization.
(AERNAScomboSEnorm <- SummarizedExperiment(assays = list(counts = normalized_counts),
colData = temp_coldat_clin,
rowRanges = combined_counts_rowRanges,
metadata = "Athero-Express RNAseq Study Combined: bulk RNA sequencing in carotid plaques accross two experiments, AERNAS1 (n=622) and AERNAS2 (n=471). Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected. Size corrected normalization."))class: RangedSummarizedExperiment
dim: 21835 1093
metadata(1): ''
assays(1): counts
rownames(21835): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000274714
rowData names(2): feature_id symbol
colnames(1093): ae1 ae1026 ... ae986 ae992
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
temp <- as.tibble(subset(colData(AERNAScomboSEnorm), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.xxxpts.samplelist.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
#
# temp <- as.tibble(colData(AERNA1SE))
#
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.xxxpts.clinicaldata.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.1093pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAScomboSEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.1093pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)temp <- as_tibble(assay(AERNAScomboSEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.1093pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAScomboSEnorm))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.1093pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Log-transform the counts using vst.
(AERNAScomboSEvst <- SummarizedExperiment(assays = list(counts = assay(AERNAScombovsd)),
colData = temp_coldat_clin,
rowRanges = combined_counts_rowRanges,
metadata = "Athero-Express RNAseq Study Combined: bulk RNA sequencing in carotid plaques accross two experiments, AERNAS1 (n=622) and AERNAS2 (n=471). Technology: CEL2-seq adapted for bulk RNA sequencing, thus 3'-focused. UMI-corrected. Size corrected normalization. log-transformed."))class: RangedSummarizedExperiment
dim: 21835 1093
metadata(1): ''
assays(1): counts
rownames(21835): ENSG00000000005 ENSG00000000419 ... ENSG00000291237 ENSG00000274714
rowData names(2): feature_id symbol
colnames(1093): ae1 ae1026 ... ae986 ae992
colData names(69): STUDY_NUMBER SampleType ... PCSK9_plasma PCSK9_plasma_rankNorm
temp <- as.tibble(subset(colData(AERNAScomboSEvst), select = c("STUDY_NUMBER", "SampleType", "RNAseqTech", "RNAseqType", "RNAseqQC",
"StudyType", "StudyName", "StudyBiobank", "SampleSize",
"InformedConsent")))
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.xxxpts.samplelist.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
#
# temp <- as.tibble(colData(AERNA1SE))
#
# fwrite(temp,
# file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.xxxpts.clinicaldata.after_qc.IC_commercial.csv"),
# sep = ",", row.names = FALSE, col.names = TRUE,
# showProgress = TRUE)
# rm(temp)
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.1093pts.samplelist.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as.tibble(colData(AERNAScomboSEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.1093pts.clinicaldata.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)temp <- as_tibble(assay(AERNAScomboSEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.1093pts.assay.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)
temp <- as_tibble(rowRanges(AERNAScomboSEvst))
fwrite(temp,
file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.1093pts.rowRanges.after_qc.IC_academic.csv"),
sep = ",", row.names = FALSE, col.names = TRUE,
showProgress = TRUE)
rm(temp)Here we just do a sanity check and compare the expression for a favorite gene.
ggpubr::gghistogram(as.tibble(t(subset(assay(AERNAS1SE), AERNAS1SE@rowRanges$symbol == "PCSK9"))),
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nraw counts | AERNAS1",
color = "white", fill = uithof_color[8],
rug = F, add_density = F,
add = c("median"),
add.params = list(color = uithof_color[3]),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
ggpubr::gghistogram(as.tibble(t(subset(assay(AERNAS1SEnorm), AERNAS1SEnorm@rowRanges$symbol == "PCSK9"))),
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nnormalized, size corrected counts | AERNAS1",
color = "white", fill = uithof_color[17],
rug = F, add_density = F,
add = c("median"),
add.params = list(color = uithof_color[3]),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
ggpubr::gghistogram(as.tibble(t(subset(assay(AERNAS1SEvst), AERNAS1SEvst@rowRanges$symbol == "PCSK9"))),
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nlog-transformed, size corrected counts | AERNAS1",
color = "white", fill = uithof_color[20],
rug = F, add_density = F,
add = c("median"),
add.params = list(color = uithof_color[3]),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
Here we just do a sanity check and compare the expression for a favorite gene.
ggpubr::gghistogram(as.tibble(t(subset(assay(AERNAS2SE), AERNAS2SE@rowRanges$symbol == "PCSK9"))),
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nraw counts | AERNAS2",
color = "white", fill = uithof_color[8],
rug = F, add_density = F,
add = c("median"),
add.params = list(color = uithof_color[3]),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
ggpubr::gghistogram(as.tibble(t(subset(assay(AERNAS2SEnorm), AERNAS2SEnorm@rowRanges$symbol == "PCSK9"))),
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nnormalized, size corrected counts | AERNAS2",
color = "white", fill = uithof_color[17],
rug = F, add_density = F,
add = c("median"),
add.params = list(color = uithof_color[3]),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
ggpubr::gghistogram(as.tibble(t(subset(assay(AERNAS2SEvst), AERNAS2SEvst@rowRanges$symbol == "PCSK9"))),
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nlog-transformed, size corrected counts | AERNAS2",
color = "white", fill = uithof_color[20],
rug = F, add_density = F,
add = c("median"),
add.params = list(color = uithof_color[3]),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.
Here we just do a sanity check and compare the expression for a favorite gene.
temp = as.data.frame(colData(AERNAScomboSE))
temp2 <- cbind(as.tibble(t(subset(assay(AERNAScomboSE), AERNAScomboSE@rowRanges$symbol == "PCSK9"))), temp)
ggpubr::gghistogram(temp2,
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nraw counts | AERNASCombined",
color = "white", fill = "StudyName", palette = "npg", # c(uithof_color[6], uithof_color[20]),
rug = F, add_density = F,
add = c("median"),
add.params = list(linetype = 2),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.
ggsave(filename = paste0(PLOT_loc, "/", Today, ".PCSK9_ENSG00000170323_GRCh38p13_EnsDb86.AERNACombinedRAW.CEA.1093pts.pdf"), device = "pdf",
dpi = 300, width = 12, height = 7, plot = last_plot())
temp = as.data.frame(colData(AERNAScomboSEnorm))
temp2 <- cbind(as.tibble(t(subset(assay(AERNAScomboSEnorm), AERNAScomboSEnorm@rowRanges$symbol == "PCSK9"))), temp)
ggpubr::gghistogram(temp2,
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nnormalized, size corrected counts | AERNASCombined",
color = "white", fill = "StudyName", palette = "npg", # c(uithof_color[6], uithof_color[20]),
rug = F, add_density = F,
add = c("median"),
add.params = list(linetype = 2),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.
ggsave(filename = paste0(PLOT_loc, "/", Today, ".PCSK9_ENSG00000170323_GRCh38p13_EnsDb86.AERNACombinedNORM.CEA.1093pts.pdf"), device = "pdf",
dpi = 300, width = 12, height = 7, plot = last_plot())
temp = as.data.frame(colData(AERNAScomboSEvst))
temp2 <- cbind(as.tibble(t(subset(assay(AERNAScomboSEvst), AERNAScomboSEvst@rowRanges$symbol == "PCSK9"))), temp)
ggpubr::gghistogram(temp2,
x = "ENSG00000169174",
xlab = "PCSK9 (ENSG00000169174) expression\nlog-transformed, size corrected counts | AERNASCombined",
color = "white", fill = "StudyName", palette = "npg", # c(uithof_color[6], uithof_color[20]),
rug = F, add_density = F,
add = c("median"),
add.params = list(linetype = 2),
ggtheme = theme_pubclean())Warning: Using `bins = 30` by default. Pick better value with the argument `bins`.
ggsave(filename = paste0(PLOT_loc, "/", Today, ".PCSK9_ENSG00000170323_GRCh38p13_EnsDb86.AERNACombinedVST.CEA.1093pts.pdf"), device = "pdf",
dpi = 300, width = 12, height = 7, plot = last_plot())
# saveRDS(AERNA1SE, file = paste0(OUT_loc, "/", Today, ".AERNAS1.CEA.608pts.SE.after_qc.IC_commercial.RDS"))
saveRDS(AERNAS1SE, file = paste0(OUT_loc, "/", Today, ".AERNAS1.CEA.622pts.SE.after_qc.IC_academic.RDS"))
saveRDS(AERNAS1SEnorm, file = paste0(OUT_loc, "/", Today, ".AERNAS1SEnorm.CEA.622pts.SE.after_qc.IC_academic.RDS"))
saveRDS(AERNAS1SEvst, file = paste0(OUT_loc, "/", Today, ".AERNAS1SEvst.CEA.622pts.SE.after_qc.IC_academic.RDS"))
# saveRDS(AERNA2SE, file = paste0(OUT_loc, "/", Today, ".AERNA.CEA.xxxpts.SE.after_qc.IC_commercial.RDS"))
saveRDS(AERNAS2SE, file = paste0(OUT_loc, "/", Today, ".AERNAS2.CEA.471pts.SE.after_qc.IC_academic.RDS"))
saveRDS(AERNAS2SEnorm, file = paste0(OUT_loc, "/", Today, ".AERNAS2SEnorm.CEA.471pts.SE.after_qc.IC_academic.RDS"))
saveRDS(AERNAS2SEvst, file = paste0(OUT_loc, "/", Today, ".AERNAS2SEvst.CEA.471pts.SE.after_qc.IC_academic.RDS"))
# saveRDS(AERNA2SE, file = paste0(OUT_loc, "/", Today, ".AERNAScomboSE.CEA.xxxpts.SE.after_qc.IC_commercial.RDS"))
saveRDS(AERNAScomboSE, file = paste0(OUT_loc, "/", Today, ".AERNAScomboSE.CEA.1093pts.SE.after_qc.IC_academic.RDS"))
saveRDS(AERNAScomboSEnorm, file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEnorm.CEA.1093pts.SE.after_qc.IC_academic.RDS"))
saveRDS(AERNAScomboSEvst, file = paste0(OUT_loc, "/", Today, ".AERNAScomboSEvst.CEA.1093pts.SE.after_qc.IC_academic.RDS"))Version: v1.2.0
Last update: 2024-01-09
Written by: Sander W. van der Laan (s.w.vanderlaan-2[at]umcutrecht.nl).
Description: Script to load bulk RNA sequencing data, and perform gene expression analyses, and visualisations.
Minimum requirements: R version 3.5.2 (2018-12-20) -- 'Eggshell Igloo', macOS Mojave (10.14.2).
**MoSCoW To-Do List**
The things we Must, Should, Could, and Would have given the time we have.
_M_
_S_
_C_
_W_
**Changes log**
* v1.2.0 Major overhaul to prepare bulkRNAseq with new data.
* v1.1.1 Fixed baseline table writing. Additional versions of saved data. Added example to 'melt' data using `mia`.
* v1.1.0 Update to bulk RNAseq data - deeper sequencing data is now available. Update to the study database.
* v1.0.1 Fixes to annotation. Fix to loading clinical dataset.
* v1.0.0 Inital version. Update to the count data, gene list. Filter samples based on artery operated (CEA) and informed consent. Added heatmap of correlation between target genes.
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS 15.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats4 grid tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] annotables_0.2.0 EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.28.1
[4] AnnotationFilter_1.28.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 DESeq2_1.44.0
[7] SummarizedExperiment_1.34.0 MatrixGenerics_1.16.0 matrixStats_1.4.1
[10] Hmisc_5.1-3 survminer_0.4.9 survival_3.7-0
[13] GGally_2.2.1 PerformanceAnalytics_2.0.4 xts_0.14.0
[16] zoo_1.8-12 ggcorrplot_0.1.4.999 corrr_0.4.4
[19] reshape2_1.4.4 bacon_1.32.0 ellipse_0.5.0
[22] BiocParallel_1.38.0 meta_7.0-0 metadat_1.2-0
[25] qqman_0.1.9 cowplot_1.1.3 RColorBrewer_1.1-3
[28] rmarkdown_2.28 Seurat_5.1.0 SeuratObject_5.0.2
[31] sp_2.1-4 BiocManager_1.30.25 EnhancedVolcano_1.22.0
[34] ggrepel_0.9.6 mygene_1.40.0 txdbmaker_1.0.1
[37] GenomicFeatures_1.56.0 GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
[40] org.Hs.eg.db_3.19.1 AnnotationDbi_1.66.0 IRanges_2.38.1
[43] S4Vectors_0.42.1 Biobase_2.64.0 BiocGenerics_0.50.0
[46] tidylog_1.1.0 patchwork_1.3.0.9000 labelled_2.13.0
[49] sjPlot_2.8.16 UpSetR_1.4.0 ggpubr_0.6.0.999
[52] forestplot_3.1.5 abind_1.4-8 checkmate_2.3.2
[55] pheatmap_1.0.12 devtools_2.4.5 usethis_3.0.0
[58] BlandAltmanLeh_0.3.1 tableone_0.13.2 openxlsx_4.2.7.1
[61] haven_2.5.4 eeptools_1.2.5 DT_0.33
[64] knitr_1.48 lubridate_1.9.3 forcats_1.0.0
[67] stringr_1.5.1 purrr_1.0.2 tibble_3.2.1
[70] ggplot2_3.5.1 tidyverse_2.0.0 data.table_1.16.2
[73] naniar_1.1.0 tidyr_1.3.1 dplyr_1.1.4
[76] optparse_1.7.5 readr_2.1.5 pander_0.6.5
[79] R.utils_2.12.3 R.oo_1.26.0 R.methodsS3_1.8.2
[82] worcs_0.1.15 credentials_2.0.2
loaded via a namespace (and not attached):
[1] igraph_2.0.3 ica_1.0-3 plotly_4.10.4 Formula_1.2-5 zlibbioc_1.50.0
[6] gert_2.1.4 tidyselect_1.2.1 bit_4.5.0 lattice_0.22-6 rjson_0.2.23
[11] blob_1.2.4 urlchecker_1.0.1 S4Arrays_1.4.1 parallel_4.4.1 png_0.1-8
[16] tinytex_0.53 cli_3.6.3 ProtGenerics_1.36.0 askpass_1.2.1 sjstats_0.19.0
[21] openssl_2.2.2 goftest_1.2-3 textshaping_0.4.0 BiocIO_1.14.0 uwot_0.2.2
[26] curl_5.2.3 mime_0.12 evaluate_1.0.1 leiden_0.4.3.1 gsubfn_0.7
[31] stringi_1.8.4 backports_1.5.0 XML_3.99-0.17 httpuv_1.6.15 magrittr_2.0.3
[36] rappdirs_0.3.3 splines_4.4.1 getopt_1.20.4 KMsurv_0.1-5 sctransform_0.4.1
[41] ggbeeswarm_0.7.2 sessioninfo_1.2.2 DBI_1.2.3 jquerylib_0.1.4 withr_3.0.1
[46] class_7.3-22 systemfonts_1.1.0 lmtest_0.9-40 rtracklayer_1.64.0 htmlwidgets_1.6.4
[51] fs_1.6.4 biomaRt_2.60.1 labeling_0.4.3 gh_1.4.1 SparseArray_1.4.8
[56] ranger_0.16.0 reticulate_1.39.0 XVector_0.44.0 UCSC.utils_1.0.0 timechange_0.3.0
[61] fansi_1.0.6 calibrate_1.7.7 RSpectra_0.16-2 irlba_2.3.5.1 ggrastr_1.0.2
[66] fastDummies_1.7.4 ellipsis_0.3.2 lazyeval_0.2.2 yaml_2.3.10 scattermore_1.2
[71] crayon_1.5.3 RcppAnnoy_0.0.22 progressr_0.14.0 later_1.3.2 ggridges_0.5.6
[76] codetools_0.2-20 base64enc_0.1-3 profvis_0.4.0 sjlabelled_1.2.0 KEGGREST_1.44.1
[81] Rtsne_0.17 limma_3.60.6 Rsamtools_2.20.0 filelock_1.0.3 rticles_0.27
[86] foreign_0.8-87 sqldf_0.4-11 pkgconfig_2.0.3 xml2_1.3.6 spatstat.univar_3.0-1
[91] mathjaxr_1.6-0 GenomicAlignments_1.40.0 spatstat.sparse_3.1-0 viridisLite_0.4.2 performance_0.12.3
[96] xtable_1.8-4 car_3.1-3 plyr_1.8.9 httr_1.4.7 globals_0.16.3
[101] sys_3.4.3 pkgbuild_1.4.4 beeswarm_0.4.0 htmlTable_2.4.3 broom_1.0.7
[106] nlme_3.1-166 dbplyr_2.5.0 survMisc_0.5.6 crosstalk_1.2.1 ggeffects_1.7.2
[111] lme4_1.1-35.5 digest_0.6.37 numDeriv_2016.8-1.1 Matrix_1.7-0 farver_2.1.2
[116] tzdb_0.4.0 rpart_4.1.23 glue_1.8.0 cachem_1.1.0 BiocFileCache_2.12.0
[121] polyclip_1.10-7 generics_0.1.3 Biostrings_2.72.1 visdat_0.6.0 CompQuadForm_1.4.3
[126] proto_1.0.0 presto_1.0.0 survey_4.4-2 parallelly_1.38.0 pkgload_1.4.0
[131] statmod_1.5.0 arm_1.14-4 RcppHNSW_0.6.0 ragg_1.3.3 carData_3.0-5
[136] minqa_1.2.8 pbapply_1.7-2 httr2_1.0.5 spam_2.11-0 utf8_1.2.4
[141] mitools_2.4 sjmisc_2.8.10 datawizard_0.13.0 ggsignif_0.6.4 gridExtra_2.3
[146] shiny_1.9.1 GenomeInfoDbData_1.2.12 clisymbols_1.2.0 RCurl_1.98-1.16 memoise_2.0.1
[151] scales_1.3.0 future_1.34.0 RANN_2.6.2 renv_1.0.11 km.ci_0.5-6
[156] spatstat.data_3.1-2 rstudioapi_0.16.0 cluster_2.1.6 spatstat.utils_3.1-0 hms_1.1.3
[161] fitdistrplus_1.2-1 munsell_0.5.1 colorspace_2.1-1 quadprog_1.5-8 rlang_1.1.4
[166] dotCall64_1.2 xfun_0.48 prereg_0.6.0 coda_0.19-4.1 e1071_1.7-16
[171] metafor_4.6-0 remotes_2.5.0 ggsci_3.2.0 bitops_1.0-9 promises_1.3.0
[176] RSQLite_2.3.7 DelayedArray_0.30.1 proxy_0.4-27 compiler_4.4.1 prettyunits_1.2.0
[181] boot_1.3-31 listenv_0.9.1 Rcpp_1.0.13 tensor_1.5 MASS_7.3-61
[186] progress_1.2.3 insight_0.20.5 spatstat.random_3.3-2 R6_2.5.1 fastmap_1.2.0
[191] rstatix_0.7.2 vipor_0.4.7 ROCR_1.0-11 ggstats_0.7.0 vcd_1.4-13
[196] nnet_7.3-19 gtable_0.3.5 KernSmooth_2.23-24 miniUI_0.1.1.1 deldir_2.0-4
[201] htmltools_0.5.8.1 bit64_4.5.2 spatstat.explore_3.3-2 lifecycle_1.0.4 zip_2.3.1
[206] nloptr_2.1.1 restfulr_0.0.15 sass_0.4.9 vctrs_0.6.5 spatstat.geom_3.3-3
[211] future.apply_1.11.2 bslib_0.8.0 pillar_1.9.0 locfit_1.5-9.10 jsonlite_1.8.9
[216] chron_2.3-61
rm(normalized_counts,
id, id2,
temp_coldat)
combined_meta_clin_ACADEMIC = temp_coldat_clin
combined_meta = temp_coldat_merge
rm(AERNAScombovsd,AERNAScombodds,
AERNA1vsd, AERNA1dds,
AERNA2vsd, AERNA2dds,
aernas1_counts, aernas1_counts_raw_qc_umicorr, aernas1_counts_raw_qc_umicorr_annot,
aernas2_counts, aernas2_counts_raw_qc_umicorr, aernas2_counts_raw_qc_umicorr_annot,
AEDB_AERNAS1_filt, AEDB_AERNAS2_filt,
combined_counts)| © 1979-2024 Sander W. van der Laan | s.w.vanderlaan[at]gmail[dot]com | vanderlaanand.science. |